Sam Reference sequence length does not match with the actual fasta sequence input
0
0
Entering edit mode
8.8 years ago
ashishtx • 0

Hello everyone,

I am trying to grasp SAM format specification along with BWA program. I see that the length of the Reference sequence length does not match with the alignment.

So my question is why SEQ1 length which is 407 bp does not match with the SAM header information which shows that the length of the reference is 402 bp?

Am I missing something very basic?

Thank you

>SEQ1
ATGCAGCTGTTCATCCACTGTCAAGGGGTTCATACCGTTGAAGTTACAGGTGAAGAGGAAGTTGCTTTCC
TCAAGCAATACCTCGAGCAGGCCGAGGGCATTGCACCTGCTGATCAAGTCCTCTACCATTCTGGCAAGCC
CCTGAGCGACGAGCTTTCTCTCTCCTGCCTGGAGAATGGTGCTTATGTTGAAGCTGTCGTCCCTCTTCTT
GGAGGTAAGGTCCATGGCTCCCTGGCTCGTGCCGGCAAGGTCAAGGGCCAGACACCGAAGGTAGAGAAAC
AGGAGAAGCGCAAGAAGAAGACCGGCCGTGCCCAGAGGCGCATGCAGTACAACAGGCGGGTCGTGAATGC
CGTTGCCACCTTCGGGCGCANGAGAGGACCCAATGCAAACCAAACTGCATAG

Sam file header:

@SQ    SN:SEQ1   LN:402
NODE32439length524cov2064.38ID64877    0    SEQ1    1    60    61S161M3I241M58S    *    0    0    CAGCATTTTTTTTGTTATTTGGTTCGTGGGTTGCTGGACGTGTGTACACGTTTGCAAGAAGATGCAGCTGTTCATTCACTGTCAAGAAGTTCACACCGTAGAAGTTACAGGCGACGAGAATGTCGCCTTCCTCAAGGAAGTTCTTGAGCAGGCCGAAGGCATTGCACCTGTTGATCAGGTCCTCTACAACTCTGGCAAGCCCCTGAGTGATGATGTTTCTCTGTCCTCCTGCCTTGAGGATGGTGCTCATGTCGAGGCCGTTGTTCCTCTGCTCGGAGGTAAGGTCCACGGCTCACTGGCTCGTGCTGGCAAAGTGAAGGGCCAGACACCGAAGGTGGAGAAACAGGAGAAACGCAAGAAGAAGACTGGCCGTGCCAAGAGGCGCATGCAGTACAACAGGCGGTTTGTGAATGCTGTTGCCACCTTTGGCCGCAGGAGGGGACCCAATGCAAACCAAACTTCATAGAGAGATGGGCCTGTGACAAATAAAATTTGTATGGTGCGTTCCTGGACGTGGTGCTCAC    *    NM:i:55    MD:Z:14C10G0G5T5T11T2A3G1A2T2T9C2T0A0C2C11G13C6A9C1T17C2C2G0C16G3A8T4T2A2T2C2C5T2T14T5C11C5G2C20A14G14C9C26G1C8C11C2G4C3A21G5    AS:i:133    XS:i:0
alignment SAM • 2.5k views
ADD COMMENT
2
Entering edit mode

Your sequence as is shown in this post is really 402bp.

ADD REPLY
0
Entering edit mode

Whoops. You guys are right. Actually I was using the sublime text to count characters and it kept showing 407. Thanks Ashutosh Pandey and Heng Li (I really admire your software).

ADD REPLY
1
Entering edit mode

It's an honor to be mentioned in the same line as Dr. Li :-)

ADD REPLY
0
Entering edit mode

The length of the reference sequence doesn't include its header. I guess you are adding >SEQ1 into the length which is wrong.

ADD REPLY
0
Entering edit mode

I am pretty sure I did not include the header. Thanks for the response.

ADD REPLY
1
Entering edit mode
atgcagctgt tcatccactg tcaaggggtt cataccgttg aagttaCAGG  50
TGAaGAGGAA GTTGCTTTCC TCAAgcaata cctcgagcag gccgagggca  100
ttgcacctgc tgatcaagtc ctctaccatt ctggcaagcc cctgagcgac  150
gagctttctc tctcctgcct ggagaatggt gcttatgttg aagctgtcgt  200
ccctcttctt ggaggtaagg tccatggctc cctggctcgt gccggcaagg  250
tcaagggcca gacaccgaag gtagagaaac aggagaagcg caagaagaag  300
accggccgtg cccagaggcg catgcagtac aacaggcggg tcgtgaatgc  350
cgttgccacc ttcgggcgca ngagaggacc caatgcaaac caaactgcat  400
ag

You may have forgotten to strip off "\n" character before counting in case you are using some script.

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6