Biostar Beta. Not for public use.
BLAST, setting maximum number of hits
1
Entering edit mode
5.4 years ago
apelin20 • 470
Canada

Hello,

I am trying to set the number of maximum hits to 5, so that the procedure can finish sooner, but I still get 100s of hits found.

# TBLASTX 2.2.29+
# Query: Locus_40_Transcript_185/186_Confidence_0.224_Length_4778
# Database: ../../Genome/Genome
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 714 hits found

I am running:

tblastx -db ../../Genome/Genome -query all_merged_k125.fa -evalue 1e-10 -outfmt 7 -out tblastx/all_merged_k125.fmt7 -num_threads 16 -max_target_seqs 5

Any idea why it's still reporting so many hits?

Adrian

ADD COMMENTlink
6
Entering edit mode
12 months ago
5heikki 8.4k
Finland

You can get e.g. 10 hits from one long target sequence (-max_target_seqs 1), i.e. max_target_seqs doesn't specify the maximum number of hits per query, but the maximum number of target sequences for hits per query.

ADD COMMENTlink
0
Entering edit mode

Ok that makes sense. How do I limit the amount of hits?

ADD REPLYlink
4
Entering edit mode

Setting -max_target_seqns to 1 will give only 1 subject/hit but several HSPs if they are present.

Setting -max_hsps to 1 will give only 1 HSP per subject but for all subject/hits in the database.

If you really want only 5 HSPs per subject, set the -max_target_seqns to 1 and -max_hsps to 5.

ADD REPLYlink
1
Entering edit mode

Makes sense, thank you!

ADD REPLYlink
0
Entering edit mode

(typo: should be -max_target_seqs instead of -max_target_seqns)

ADD REPLYlink
2
Entering edit mode

I would do it post blast (with -outfmt 6 output):

Make sure the file is sorted based on query and best hits (here bitscore > evalue > perc identity):

export LC_ALL=C LC_LANG=C; sort -k1,1 -k12,12gr -k11,11g -k3,3gr outputFile > sortedFile

Then get the top 5 hits for every query:

for next in $(cut -f1 sortedFile | sort -u); do grep -w -m 5 "$next" sortedFile; done > topFivePerQuery
ADD REPLYlink
0
Entering edit mode

Yes, the above command made my day!!!!!!!!!

Thanks

ADD REPLYlink
0
Entering edit mode

I guess you can always give a relatively stringent e-value and filter the resulting hits later.

ADD REPLYlink
0
Entering edit mode

What I wanted is to speed up the blasting.

ADD REPLYlink
1
Entering edit mode

I doubt limiting the number of hits like that would speed up your blasting significantly. It still has to go through the whole db for every query, so the only difference would be in how long it takes to write 5 or 10 lines (or whatever) to the output file. Instead, if your db is small (or you have a ton of RAM), you should parallelize blast (e.g. with GNU Parallel) by running multiple single-threaded blasts on split input instead of using -num_threads X..

ADD REPLYlink
0
Entering edit mode

^True. You will benefit from multi-threading, and trying both tblastx and blastall -p tbalstx before choosing one of them. For shorter query sequences, I've seen the latter be significantly faster than the former.

ADD REPLYlink
0
Entering edit mode
12 weeks ago
RamRS 21k
Houston, TX

There seems to a problem with -outfmt 7. Can you check if this problem persists if you use the default output format?

ADD COMMENTlink
0
Entering edit mode

I am trying outfmt 6. Same thing, gives out many results, beyond 5.

ADD REPLYlink
0
Entering edit mode

Try not giving the outfmt param. If it works, then it could be a out format group specific problem.

ADD REPLYlink
0
Entering edit mode
12 months ago
dago ♦ 2.5k
Germany

If I am not wrong with -outfmt > 4, -max_target_seqs is ignored. At least this is true for psiblast

ADD COMMENTlink
0
Entering edit mode

max_target_seqs - Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.

ADD REPLYlink
0
Entering edit mode

No, it is recommended to be used with outfmt>4. See here: http://www.ncbi.nlm.nih.gov/books/NBK1763/#_CmdLineAppsManual_Appendix_C_Options_for_

ADD REPLYlink
0
Entering edit mode

You are right. I checked my code and it refers to -num_descriptions

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1