Help to filter blastp results
0
0
Entering edit mode
7.6 years ago
guillaume.rbt ★ 1.0k

Hi all,

I'm trying to blast two sets of protein against each other to find similarities.

I'm using this command to do so : blastall -d set1.fasta -i set2.fa -p blastp -m 9 -e 0.01 -o results.blast

As the two sets are from the same sepcies, I would like to filter results to get only > 99% identity matching sequences, and with query and subject of same length. After filtering for % of identity sometimes I get results like this one:

Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score

protein_1 protein_2 100.00 76 0 0 1 76 1 76 3e-46 154

protein_1 protein_2 100.00 76 0 0 77 152 1 76 3e-46 154

protein_1 protein_2 100.00 76 0 0 153 228 1 76 3e-46 154

protein_1 protein_2 100.00 76 0 0 229 304 1 76 3e-46 154

Here 4 parts of the protein 1 blast to the same sequence of protein 2. As I only want Hits with protein of the same length I would like to filter out those kinds of results, but I don't know how. Would anyone know a parameter that could do that, or a way to filter the result file?

Thanks,

blast filter fasta • 2.8k views
ADD COMMENT
1
Entering edit mode

You don't have information of query and subject sequence lengths in that table so it's not possible. With blast+ you could include qlen and slen in your output rows. I don't know if you can do that with legacy blast..

ADD REPLY
0
Entering edit mode

Thanks, it works well with blast+.

ADD REPLY
0
Entering edit mode

How large are your two sets? Possibly its easier to make simple pairwise alignments of those proteins which have the same length. In Biopython you may use the pairwise2 module for this task (e.g. alignment = pairwise2.align.globalxx(seq1, seq2, score_only=True). For this example the score of the alignment should equal the lenght of the protein if the two proteins are 100% identical).

ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6