BLAST alignment: protein matching 100% identity to larger protein, error?
1
0
Entering edit mode
9.2 years ago
ruyinii • 0

I had an experimentally derived protein sequence matching 100% to the MAP7 protein in homo sapiens. But when I ran a multiple alignment in BLAST I realized my sequence search did match 100%, but only to a small fraction of the entire MAP7 protein. Does this mean my sequence isn't actually a protein?

protein-analysis blast sequence-alignment • 7.4k views
ADD COMMENT
0
Entering edit mode

In my opinion, it seems that you have only a domain of the MAP7 protein. Or, maybe the gene codifies for another protein apart from MAP7 which has the domain that matches in your blast.

ADD REPLY
0
Entering edit mode

That would imply that my sequence is translated as a fragment of the MAP7, is that possible? The sequence itself seems to have the same domain as MAP7 from InterPro searches at least.

ADD REPLY
2
Entering edit mode
9.2 years ago
pld 5.1k
From the name:
BLAST: Basic Local Alignment Search Tool

BLAST generates local alignments, not global alignments. When blast reports percent identity, it is reporting the percent identity for the region aligned, not the whole sequence.

This is where bit scores and expect values are important. You should review the tutorial that is available for BLAST here.

In general, you want to judge the significance of your BLAST hits by expect value. The expect value tells you the number of other sequences in the database searched you'd expect to find by chance. The lower this value the better, ideally you'll want to pick the lowest expect value. The bit score of blast results are impacted by the length of the alignment, so expect values can be better for judging hits.

Now if you're question is "is my sequence MAP7", it gets a bit more complicated. Unidirectional BLAST is only capable of identifying homologs, sequences that are similar. If you want to establish that your sequence is in fact MAP7 (i.e. they have the same function), you need to look into ortholog detection.

It would help if you would post the stats of the alignment (hit length, start/end, bit score, e val, etc). What means did you use to establish your protein sequence? Which database did you blast against?

ADD COMMENT
0
Entering edit mode

The local alignment factor completely slipped my mind. I'm still not used to bioinformatics.

The protein sequence was derived from MS/MS data and I used the multiple sequence alignment on NCBI BLASTp to run my sequence and Q14244 (MAP7), so I didn't see any option for database selection.

Additional info:

Bit score: 169 bits(428)
Evalue: 2e-54
Query length: 94 aa
MAP7 length: 749 (matched region 453 - 546)
Gaps: 0

By ortholog detection do you mean Pfam searches and the like?

ADD REPLY
0
Entering edit mode

That's a pretty small hit. Are you sure it comes from Human, or did you just pick that species? What happens if you BLAST against the nr database?

ADD REPLY

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6