Accelerating BLAST for million sequence BLASTp all-by-all
0
2
Entering edit mode
9.3 years ago
Anand Rao ▴ 630

I need to run an all-by-all BLASTp on a large dataset of ~ 2 million protein sequences.

I see that there are 2 routes that folks have employed in the past. And some related posts are here at Correct Method To Blast All-Vs-All With Ncbiblast & How To Speed It Up? or elsewhere at http://seqanswers.com/forums/showthread.php?t=5752 etc

Route 1: Split input files and then run BLAST on these smaller chunks

Route 2: Use comparable tool such as open source mpiBLAST

Are these the only practical routes for large BLAST runs or are there other related / unrelated ways to go about it?

And finally is

Route 3: Both splitting input files AND using mpiBLAST a sound idea? If not, why not?

Thanks for your answers

BLAST parallel mpi cpu • 2.9k views
ADD COMMENT
0
Entering edit mode

I moved this from forum since there is a clear question. In my opinion, use route 1. I did not see any major improvements with mpiBLAST and it is more difficult to configure and use. Splitting the input and doing blast in parallel should be easy to implement on any system.

ADD REPLY

Login before adding your answer.

Traffic: 2330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6