BLAST result showing greater sequence length in %identities than that of the query sequence.
1
0
Entering edit mode
9.2 years ago

Please help me to understand the following BLAST result,

Query= sp|P14738|FNBA_STAA8 Fibronectin-binding protein A OS=Staphylococcus
aureus (strain NCTC 8325) GN=fnbA PE=1 SV=1

Length=1018

> gi|49484704|ref|YP_041928.1| fibronectin-binding protein precursor
[Staphylococcus aureus subsp. aureus MRSA252]
Length=965

Score = 1404 bits (3633),  Expect = 0.0, Method: Compositional matrix adjust.
Identities = 748/1022 (73%), Positives = 826/1022 (81%), Gaps = 61/1022 (6%)

Please notice the length of the query sequence i.e 1018 and the length from which the %identities is being shown i.e 1022. What I am not able to understand is where did this 1022 come from? I know it may be a naive question but please consider that I am new to this field.

blast • 3.9k views
ADD COMMENT
0
Entering edit mode

Maybe the length is the target's length? E.g. your 1018 query are aligned to a sequence of length 1022 and 748 of the sequence are identical?

ADD REPLY
0
Entering edit mode

But notice the length of the target mentioned which is 965.

ADD REPLY
1
Entering edit mode

Can you check if there is any indel? e.g.

Query    ACT--G
Target   ACTTTG
ADD REPLY
1
Entering edit mode

Yes, this seems to be case. I aligned the query and the subject sequences and there are 4 gaps in the query sequence in the alignment.

ADD REPLY
0
Entering edit mode

Okay, so the 4 gaps have been added to the query sequence and therefore the length is now 1022. but how do I know if there are actually 4 gaps? Does QL--YK means 4 gaps?

ADD REPLY
2
Entering edit mode

No, there are two gaps in the QL--YK region and two other gaps are somewhere else in the query sequence. I copy pasted the portions of the alignment that contains the 4 gaps in the query sequence.

Query  361   RFSHVAFIKPNNGKTT-SVTVTGTLMKGSNQNGNQPKVRIFEYLGNNEDIAKSVYANTTD  419
             +F+HVA+IKP NG  + SVTVTG L +GSN+NG QP V+I+EY+G    + +SVYANT D
Sbjct  361   KFTHVAYIKPINGNNSDSVTVTGMLTQGSNENGTQPNVKIYEYVGVENGLPQSVYANTVD  420
Query  480   QL--YKYYYDRGYTLTWDNGLVLYSNKANGNEKNGPIIQNNKFEYKEDTIKETLTGQYDK  537
                   YYY+ GYTLTWDNGLVLYSNKANG+ K GPI+ +N FE+ ED+   +++GQYD 
Sbjct  481   NRYKTYYYYNNGYTLTWDNGLVLYSNKANGDGKYGPIVDSNNFEFSEDSGNGSISGQYDA  540
Query  778   VPQIHGQNKGNQSFEEDTEKDKPKYEHGG-NIIDIDFDSVPHIHGFNKHTEIIEEDTNKD  836
             VPQIHG NK N+  EEDT KDKP Y+ GG N +D + D++P + G N+  + IEEDT   
Sbjct  781   VPQIHGFNKHNEIIEEDTNKDKPNYQFGGHNSVDFEEDTLPKVSGQNEGQQTIEEDTTPP  840
ADD REPLY
0
Entering edit mode

Okay, I get it now. Thank You so much Siva and Sam, that was very kind of you :)

ADD REPLY
0
Entering edit mode

Yeah, it's there,

Query  480   QL--YKYYYDRGYTLTWDNGLVLYSNKANGNEKNGPIIQNNKFEYKEDTIKETLTGQYDK  537
                                YYY+ GYTLTWDNGLVLYSNKANG+ K GPI+ +N FE+ ED+   +++GQYD
Sbjct  481     NRYKTYYYYNNGYTLTWDNGLVLYSNKANGDGKYGPIVDSNNFEFSEDSGNGSISGQYDA  540

What does it mean?

ADD REPLY
1
Entering edit mode

It means either there are insertions in your subject sequence or deletions in your query sequence in those 4 positions. Or it could also be due to sequencing errors.

ADD REPLY
2
Entering edit mode
9.2 years ago
Michael 54k

The percent identity is given with respect to the length of the aligned region in this output, not with respect to query or subject length. The alignment length can deviate from both query and subject in case of partial alignments and gapped alignments or both. In your case, there seem to be 4 gaps in the longer sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 2145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6