Bacteria taxonomy ID at species level NCBI
2
0
Entering edit mode
2.0 years ago

Hi everyone, I am trying to limit my local blastp search against the nr database to only bacteria, so I can later get the best hit for each species. To do that, I can get the taxonomy ID for all bacterias and use the -taxidlist parameter to perform a local blastp. However, I realised that my taxonomy ID file, containing bacterial tax IDs from NCBI has "duplicated" taxonomy identifiers for different strains of the same species. For example, the taxonomy ID for Clostridium acetobutylicum is 1488, but there are other IDs that point to the same species (Clostridium acetobutylicum str. ATCC 824 - 272562 // Clostridium acetobutylicum str. DSM 1731 - 991791 // Clostridium acetobutylicum str. EA 2018 - 863638).

I would like to get a file with the tax ID for all species (e.g. Vibrio parahaemolyticus - 670, but not all the strains associated with it (e.g. 2082734, 2082733, 2082732, ... and the other 254 strains annotated in the Taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy/?term=txid670[Subtree])). Manual checking is out of the table so, is there any way to accomplish that?

Thanks

bacteria protein blast ncbi taxonomy • 1.1k views
ADD COMMENT
3
Entering edit mode
2.0 years ago

With taxonkit list and filter

$ taxonkit list --ids 670 -nr \
    | head -n 5
670 [species] Vibrio parahaemolyticus
  223926 [strain] Vibrio parahaemolyticus RIMD 2210633
  419109 [strain] Vibrio parahaemolyticus AQ3810
  563771 [strain] Vibrio parahaemolyticus Peru-466
  563772 [strain] Vibrio parahaemolyticus AQ4037


$ taxonkit list --ids 670 -I "" \
    | taxonkit filter -E species \
    | taxonkit lineage -L -nr 
670     Vibrio parahaemolyticus species

For all bacteria

# what you want
$ taxonkit list --ids 2 -I "" \
    | taxonkit filter -E species -o bacteria-species.txt

$ taxonkit lineage -L -nr bacteria-species.txt \
    | head -n 5
2707    Citrus greening disease-associated bacterium    species
29579   Heliothis virescens testis endosymbiont species
37449   gas vacuolate str. 90-P<gv>1    species
41355   mussel methanotrophic gill symbiont MAR2        species
42121   unidentified gamma proteobacterium RS15 species
ADD COMMENT
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1317 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6