Taxids from BLAST search that are not present in names.dmp or nodes.dmp
1
1
Entering edit mode
4.9 years ago

I've been trying to assign the taxonomy of some blastn hits from a locally downloaded nt database by referencing the names.dmp and nodes.dmp files included in taxdump. Everything went smoothly for almost all ~300,000 hits, except for six taxids which aren't present in the taxdump files.

These taxids do return a match when searching NCBI's online taxonomy browser (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cg), but the match they return doesn't show that taxid on the page (ie when searching taxid 1796529, it returns a beetle subfamily with a taxid of 2558035). Four of the taxids even return the same subfamily (each search returns a page with the same taxonomy, but a slightly different taxid which is also perplexing). Here are the six:

1859523

1796529, 1796531, 1796534, 1796527

1796546

Not sure if it's some kind of aliasing or these taxids are deprecated, but does anyone know what's going? Any help is much appreciated!

blast taxonomy • 1.2k views
ADD COMMENT
3
Entering edit mode
4.9 years ago
GenoMax 141k

These ID's apper to be internally aliased to new taxID's.

Using Entrezdirect (truncated for brevity):

$ efetch -db taxonomy -id "1859523" -format native -mode xml

https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet><Taxon>
    <TaxId>110022</TaxId>
    <ScientificName>Hydroporus erythrocephalus</ScientificName>
    <OtherNames>
        <Includes>Hydroporus sp. BMNH 1425131</Includes>
        <Name>
            <ClassCDE>authority</ClassCDE>
            <DispName>Hydroporus erythrocephalus (Linnaeus, 1758)</DispName>
        </Name>
    </OtherNames>
    <ParentTaxId>107870</ParentTaxId>
    <Rank>species</Rank>
    <Division>Invertebrates</Division>
    <GeneticCode>
        <GCId>1</GCId>
        <GCName>Standard</GCName>
    </GeneticCode>
    <MitoGeneticCode>
        <MGCId>5</MGCId>
        <MGCName>Invertebrate Mitochondrial</MGCName>
    </MitoGeneticCode>
    <Lineage>cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Protostomia; Ecdysozoa; Panarthropoda; Arthropoda; Mandibulata; Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera; Holometabola; Coleoptera; Adephaga; Dytiscoidea; Dytiscidae; Hydroporinae; Hydroporini; Hydroporus</Lineage>

$ efetch -db taxonomy -id "1796529" -format native -mode xml

https://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet><Taxon>
    <TaxId>2558035</TaxId>
    <ScientificName>Scolytinae sp. BMNH 1274286</ScientificName>
    <OtherNames>
        <Includes>Platypodinae sp. BMNH 1274286</Includes>
    </OtherNames>
    <ParentTaxId>105155</ParentTaxId>
    <Rank>species</Rank>
    <Division>Invertebrates</Division>
    <GeneticCode>
        <GCId>1</GCId>
        <GCName>Standard</GCName>
    </GeneticCode>
    <MitoGeneticCode>
        <MGCId>5</MGCId>
        <MGCName>Invertebrate Mitochondrial</MGCName>
    </MitoGeneticCode>
    <Lineage>cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Protostomia; Ecdysozoa; Panarthropoda; Arthropoda; Mandibulata; Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera; Holometabola; Coleoptera; Polyphaga; Cucujiformia; Curculionoidea; Curculionidae; Scolytinae; unclassified Scolytinae</Lineage>
    <LineageEx>
ADD COMMENT
0
Entering edit mode

Thanks for looking that up! Any idea why those taxids specifically are aliased?

I also found that these aliases are present in the merged.dmp file included in taxdump so that should make automation fairly simple!

ADD REPLY

Login before adding your answer.

Traffic: 2485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6