How Could I Run A Huge Number Of Blast Calls Faster?
4
3
Entering edit mode
11.0 years ago

For the purpose of my project I need to break down a genome and run blast for each part. Because of the amount of genome file it would be at least 4000 call of blast which would take a lot of time. I'm using NCBIQBlastService to do my alignment remotely and as I checked for each request it would take 20 sec so for the whole 4000 it would take around a day. Is there any other way to do this faster. any suggestion would be really appriciated. and BTW this might help too http://biojava.org/wiki/BioJava:CookBook3:NCBIQBlastService

blast genome ncbi • 8.6k views
ADD COMMENT
4
Entering edit mode
11.0 years ago
Josh Herr 5.8k

There are multiple ways to speed up a BLAST analysis. For a start, if you run your BLAST locally it will be faster than sending all the data back and forth between NCBI. Can you run BLAT instead?

ADD COMMENT
3
Entering edit mode

I guess if he runs BLAT, he will miss lots of homologous sequences he might be interested in as Blast is more sensitive than BLAT because blast uses a smaller window size of 3 when it looks for homologous seauences whereas BLAT uses a longer Window size. I usually don't prefer BLAT instead of Blast unless I look for highly similarities or do mapping. Even BLAT will take quite long time unless you run a parallel BLAT means you need to divide your sequences in many segments and run the BLAT and finally put the output back together.

ADD REPLY
1
Entering edit mode

+1, and I absolutely agree about BLAT. For a lot of what I do, BLAT can suffice and saves a little bit of time. When I have to identify millions of environmental sequences against an extremely large databases, you're not exactly going to get high levels of confidence anyway.

ADD REPLY
2
Entering edit mode
11.0 years ago

Perhaps you could send your searches in 25-jobs-at-one-time batches to EMBL's NCBI BLAST REST-based service. At a 25:1 ratio, a set of jobs that take a day would take a little less than an hour (all other things being equal).

ADD COMMENT
2
Entering edit mode

EMBL-EBI provide a range of sequence similarity search services, for there SOAP and REST web service interfaces are available (see https://www.ebi.ac.uk/Tools/webservices/#sequence_similarity_search_sss) as well as the web interfaces (http://www.ebi.ac.uk/Tools/sss/). Sample clients are provided, and some suggestions on implementing batch analysis is provided (https://www.ebi.ac.uk/Tools/webservices/help/faq#how_can_i_analyse_multiple_sequences)..

NCBI's BLAST web services (see http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=DeveloperInfo) have different usage restrictions, One of these being limiting the frequency of requests. Given the average runtime of your jobs as 20s that a request per 3s suggests about 6 jobs could be run in parallel. If the query sequences can be batched, so each job performs 10 searches, than the average job time would increase to about 200s and since the request frequency is what is being limited that translates into significant parallelism.

All that said the databases available at EMBL-EBI are not the same as those available from NCBI, so the database choice may force the use of one particular service.

ADD REPLY
1
Entering edit mode
11.0 years ago

As suggested by Josh Herr, you can use blast on your computer to perform large numbers of blasts faster.

The easiest, from my perspective, way to do that would be to have Linux (or MacOSX) installed on a computer, install blast and desired databases and launch the blasts.

If you have no experience with UNIX-like systems, then you would probably need help from a person that is knowledgeable about this.

If you tell us a bit more about your experience, the computer you use or could use in the lab (installed systems, number of CPUs), we may be able to help you some more.

ADD COMMENT
0
Entering edit mode
9.7 years ago
qiyunzhu ▴ 130

This thread has been there for a long time, but I would like to add a new tip for those who run ncbi-blast+ in their on computers: that if you place the database in a fast storage device (e.g., SSD), you will get a *dramatic* gain in speed!

ADD COMMENT
0
Entering edit mode

Did you benchmark this? I would have guessed the OS would cache the most frequently used pages in memory anyway.

ADD REPLY
0
Entering edit mode

I didn't do a serious benchmark, but estimated a 3-10 fold increase in speed. I also think memory will make a key contribution, if it is large enough (I guess 128GB is necessay for the whole nr), and if I can throw the whole database into memory somehow.

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6