Stop BLAST search on any hit
2
0
Entering edit mode
8.4 years ago
colin.kern ★ 1.1k

I'm trying to classify transcripts as coding or noncoding, and part of this involves running BLAST with the sequence against a database of known proteins like uniprot or Pfam domains. Since these are large databases, the BLAST takes a long time and it's taking me days to classify just a small number of transcripts. Since I'm not really interested in the alignments themselves, I'm just looking for a yes or no answer to whether there were any hits below an evalue threshold, is there any way I can just have BLAST stop and report a hit when it finds the first one?

edit: Using NCBI Blast+, btw

blast alignment • 2.3k views
ADD COMMENT
3
Entering edit mode
8.4 years ago
Chris Fields ★ 2.2k

You can set the e-value threshold to what you want and -max_target_seqs to 1. Might also help to set the task to 'megablast' if you are running BLASTX.

The other option (which we have switched to and I highly recommend) is to use DIAMOND instead of BLAST+, which runs about 4000-20000x faster than BLASTX and (when using the --sensitive option) has comparable sensitivity. I've run a pretty nasty genome assembly (>2M scaffolds) against nr in less than one day.

EDIT: changed to correct parameter name

ADD COMMENT
0
Entering edit mode

There are two arguments '-max_target_seqs' and '-max_hsps_per_subject'. I've just done some testing with both and I don't think max_target_seqs offers any speedup. Setting max_hsps_per_subject seems to have reduced the time to 75% of before on the one test sequence I'm using. I think these arguments still generate all hits and then only output the top N depending on the argument. I'm afraid megablast will be too strict and making the parameters more lenient would probably just revert it back to blastx running time. I'm looking into Diamond now.

ADD REPLY
0
Entering edit mode
8.4 years ago
Siva ★ 1.9k

Can you try the 'blastx-fast' task option ("-task blastx-fast")? Apparently, this option increases the word length with an improved trade-off between speed and sensitivity as described in this paper.

ADD COMMENT

Login before adding your answer.

Traffic: 1601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6