blastdbcmd taxonomy id conflict
1
0
Entering edit mode
6.2 years ago
Chirag Parsania ★ 2.0k

Hi, I am using 'blastdbcmd ' command to map taxonomy ids and other details to the local blast outcomes. For the query protein “WP_071944094.1” I found taxonomy id conflict between what I get locally and online record. See the command and outcome given below for your reference.

Local command :

blastdbcmd -db <path to nr> -outfmt '%a %g %T %t' -out "temp.txt" -entry WP_071944094.1

outcome :

WP_071944094.1 1110718287 1480675 glyoxalase [Halolamina sediminis]
APE31202.1 1108539552 1480675 glyoxalase [Halolamina sediminis]

According to this the given protein belongs to species Halolamina sediminis(tax id : 1480675). However, online record of WP_071944094.1 shows that protein belongs to Halomonas aestuarii (tax id : 1897729).

Can anyone please explain why there is conflict between online result and local outcome. ?

blastp localblast taxonomyid • 2.4k views
ADD COMMENT
0
Entering edit mode

We can assume you're using the same database version locally as the one presented online at NCBI ? ( == your local DBs are up-to-date)?

ADD REPLY
1
Entering edit mode
6.2 years ago

This seems to be a non-redundant RefSeq protein entry, hence the difference . You can find some additional info here :RefSeq non-redundant proteins

ADD COMMENT
0
Entering edit mode

Hi, I believe you said it rightly. I found below paragraph form the link you shared, which explains the cause for the question I asked.

A non-redundant protein record that provides organism information at the level of a genus, family, or even super-kingdom does not mean that the protein is found in all RefSeq genomes below that taxonomic classification. It only indicates that the protein is found in more than one genome of different species for which the genus, family, or super-kingdom classification is the lowest common taxonomic node.

Which also means that the taxonomy id which I found locally is more specific compare to online record. Correct me if I am wrong.

Anyway, thanks a lot.

ADD REPLY
0
Entering edit mode

Yes, that could explain. However, I'm starting to think there might be something else going on :-/ . Did you check the DB version (local vs online) ? and there is also no parsing issue in your output? Perhaps these entries have been updated recently? Here is the blastdbcmd output from a DB version from sep 2017 (I'm trying to trace the discrepancy) :

$ blastdbcmd -db /blastdb/shared/prot -entry WP_071944094.1 -outfmt '%a %g %T %t'
WP_071944094.1 1110718287 1897729 glyoxalase [Halomonas aestuarii]
APE31202.1 1108539552 1897729 glyoxalase [Halomonas aestuarii]
$ blastdbcmd -db /blastdb/shared/prot -entry WP_053947656.1 -outfmt '%a %g %T %t'
WP_053947656.1 928922717 1480675 glyoxalase [Halolamina sediminis]

As you can see it gives the same output as the online query

ADD REPLY
0
Entering edit mode

Hi Sterk, you have changed the input id from WP_071944094.1 to WP_053947656.1. There is no issue with WP_053947656.1.

I ran the same command with my database version (perhaps mine is little older than you. Downloaded in mid 2017). It gave me same output like yours.

WP_053947656.1 928922717 1480675 glyoxalase [Halolamina sediminis]
ADD REPLY
1
Entering edit mode

To confirm (and perhaps close the issue) I just came to download the latest version of the nr DB from NCBI. I tried it again and the output confirms my previous reply:

$ blastdbcmd -db /blastdb/shared/prot -entry WP_071944094.1 -outfmt '%a %g %T %t'
WP_071944094.1 1110718287 1897729 glyoxalase [Halomonas aestuarii]
APE31202.1 1108539552 1897729 glyoxalase [Halomonas aestuarii]

I see thus no discrepancy when querying this ID locally.

My best guess is that your DB was outdated and that this particular ID might have been revised in the meantime.

ADD REPLY
1
Entering edit mode

Hi lieven, Thanks for this. By the way got same reply from NCBI. See below


**You may want to update the metadata file by redownload the taxdb.tar.gz file from our ftp site as well as update your nr or refseq_protein database.

Working with updated files, I have:

blastdbcmd -db nr -entry WP_071944094.1 -outfmt "%a %T %S"
WP_071944094.1 1897729 Halomonas sp. Hb3
APE31202.1 1897729 Halomonas sp. Hb3

This matches the web record's link to taxonomy: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1897729**

ADD REPLY
0
Entering edit mode

Yes, I know. i just wanted to indicate I get two different outputs. The main point is that I get the same result for WP_071944094.1 when I do that on my local DB as you get with the online search. == there is thus no discrepancy in my trial between local and online querying this ID.

ADD REPLY

Login before adding your answer.

Traffic: 2396 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6