Biostar Beta. Not for public use.
Extract 10 matches in BLAST
0
Entering edit mode
16 months ago
gkalgus • 0

I have a blast output in outfmt 6 format, like this:

JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  20.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287
JXPU01010214    XP_021704425.1  91.21   182 16  0   3548    152 333 2e-91     287

But I have a zillion of lines, each subject has more than 100 matches, how I can extract only the first 10 matches of each subject?

Thanks.

blast extract • 158 views
ADD COMMENTlink
0
Entering edit mode

If you are running ncbi blastn from the command line you can limit the number of results displayed per query using the -num_descriptions (or possibly its -num_alignments ... one controls the one-line summaries and the other controls the ascii alignment views that are output in the default format). The default setting is pretty high, like 250-500 results per query. It will show the best scoring hits first, reporting as many as you request. If you have many tied, top scoring hits I think its arbitrary which amongst those hits gets shown.

If you are running it from the NCBI website you'll have to poke around their blast page. I don't recall if those are settings you can set using their web tool.

ADD REPLYlink
0
Entering edit mode

I am running a local BLAST, but the parameter num_descriptions works only with outmft 4 our less, and the parameter num_alignments return a variable number of matches, for example, I put the parameter num_alignments 10 and several subjects returner 13 or more matches...

ADD REPLYlink
0
Entering edit mode
14 months ago
h.mon 25k
Brazil

Your example is really weird, as there are several identical lines - in fact, the only difference is second line third column, with a value of "20.21" instead of "91.21". I can't imagine blast producing such output, unless your query consists of the same sequence repeated several times.

I put the parameter num_alignments 10 and several subjects returner 13 or more matches...

Probably the same subject is returning more than one HSP, to solve this you have to use -max_hsps 1.

But I have a zillion of lines, each subject has more than 100 matches, how I can extract only the first 10 matches of each subject?

In blast terminology, the database sequences are the "subject", and the sequences one submits are the "query". I suspect you want to filter the first 10 matches of each query, correct? You can easily achieve this with perl or awk, here is an awk solution:

awk '{ if (++count[$1] <= 10) print $0 }' blast_tabular.tsv
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1