BLAST, setting maximum number of hits
3
1
Entering edit mode
9.2 years ago
apelin20 ▴ 480

Hello,

I am trying to set the number of maximum hits to 5, so that the procedure can finish sooner, but I still get 100s of hits found.

# TBLASTX 2.2.29+
# Query: Locus_40_Transcript_185/186_Confidence_0.224_Length_4778
# Database: ../../Genome/Genome
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 714 hits found

I am running:

tblastx -db ../../Genome/Genome -query all_merged_k125.fa -evalue 1e-10 -outfmt 7 -out tblastx/all_merged_k125.fmt7 -num_threads 16 -max_target_seqs 5

Any idea why it's still reporting so many hits?

Adrian

BLAST tblastx parallel • 37k views
ADD COMMENT
7
Entering edit mode
9.2 years ago
5heikki 11k

You can get e.g. 10 hits from one long target sequence (-max_target_seqs 1), i.e. max_target_seqs doesn't specify the maximum number of hits per query, but the maximum number of target sequences for hits per query.

ADD COMMENT
0
Entering edit mode

Ok that makes sense. How do I limit the amount of hits?

ADD REPLY
4
Entering edit mode

I would do it post blast (with -outfmt 6 output):

Make sure the file is sorted based on query and best hits (here bitscore > evalue > perc identity):

export LC_ALL=C LC_LANG=C; sort -k1,1 -k12,12gr -k11,11g -k3,3gr outputFile > sortedFile

Then get the top 5 hits for every query:

for next in $(cut -f1 sortedFile | sort -u); do grep -w -m 5 "$next" sortedFile; done > topFivePerQuery
ADD REPLY
0
Entering edit mode

Yes, the above command made my day!!!!!!!!!

Thanks

ADD REPLY
4
Entering edit mode

Setting -max_target_seqns to 1 will give only 1 subject/hit but several HSPs if they are present.

Setting -max_hsps to 1 will give only 1 HSP per subject but for all subject/hits in the database.

If you really want only 5 HSPs per subject, set the -max_target_seqns to 1 and -max_hsps to 5.

ADD REPLY
1
Entering edit mode

Makes sense, thank you!

ADD REPLY
0
Entering edit mode

(typo: should be -max_target_seqs instead of -max_target_seqns)

ADD REPLY
0
Entering edit mode

I guess you can always give a relatively stringent e-value and filter the resulting hits later.

ADD REPLY
0
Entering edit mode

What I wanted is to speed up the blasting.

ADD REPLY
1
Entering edit mode

I doubt limiting the number of hits like that would speed up your blasting significantly. It still has to go through the whole db for every query, so the only difference would be in how long it takes to write 5 or 10 lines (or whatever) to the output file. Instead, if your db is small (or you have a ton of RAM), you should parallelize blast (e.g. with GNU Parallel) by running multiple single-threaded blasts on split input instead of using -num_threads X..

ADD REPLY
0
Entering edit mode

^True. You will benefit from multi-threading, and trying both tblastx and blastall -p tbalstx before choosing one of them. For shorter query sequences, I've seen the latter be significantly faster than the former.

ADD REPLY
0
Entering edit mode
9.2 years ago
Ram 43k

There seems to a problem with -outfmt 7. Can you check if this problem persists if you use the default output format?

ADD COMMENT
0
Entering edit mode

I am trying outfmt 6. Same thing, gives out many results, beyond 5.

ADD REPLY
0
Entering edit mode

Try not giving the outfmt param. If it works, then it could be a out format group specific problem.

ADD REPLY
0
Entering edit mode
9.2 years ago
dago ★ 2.8k

If I am not wrong with -outfmt > 4, -max_target_seqs is ignored. At least this is true for psiblast

ADD COMMENT
0
Entering edit mode

max_target_seqs - Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.

ADD REPLY
0
Entering edit mode

No, it is recommended to be used with outfmt>4. See here: http://www.ncbi.nlm.nih.gov/books/NBK1763/#_CmdLineAppsManual_Appendix_C_Options_for_

ADD REPLY
0
Entering edit mode

You are right. I checked my code and it refers to `-num_descriptions`

ADD REPLY

Login before adding your answer.

Traffic: 2634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6