Why does some hits that show up in search against smaller database, but does not show up in entire nr database?
2
0
Entering edit mode
5.6 years ago
sajal • 0

I made a subset of nr databse using blastdbaliastool. I split the nr database into 70%:30%, and created two smaller databases (say, nr-70 and nr-30). Now I run blastp against both of them using query q and store the two results separately (say result-70, result-30). I use "-max_target_seqs 50" and "-max_hsps 20" options. No explicit e-value cutoff is given.

I run blastp against whole nr using the same query q, say the result is result-100. When I compare result-70 and result-30 against the result-100, I see a strange phenomena. While result-70 and result-30 have 29 and 26 hits, result-100 has only 28 hits. Since nr-70 and nr-30 are non-overlapping and they together constitute nr, it should find 50 hits since 29 + 26 = 55 > 50. Some of the hits found from search against smaller databases don't show up in the search result from the larger database.

Any idea why this is happening?

alignment blastp • 1.2k views
ADD COMMENT
0
Entering edit mode

I don't know if it the cause of your results, but --max_target_seqs have a known somewhat unexpected behaviour:

What BLAST's max-target-sequences doesn't do

I wonder if database size also interacts these blast heuristcs.

ADD REPLY
1
Entering edit mode
5.6 years ago
buchfink ▴ 250

The evalue depends on the size of the database, so some hits might be above the evalue treshold (10) if you use the whole db.

ADD COMMENT
0
Entering edit mode
5.6 years ago
piet ★ 1.8k

Please note that you are limiting the number of hits returned (options -max_target_seqs 50 and -max_hsps 20). Try to set these options to much larger numbers, for example -max_target_seqs 20000 and -max_hsps 20000.

ADD COMMENT

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6