Biostar Beta. Not for public use.
How to check if a read in a SAM file contains substitutions
0
Entering edit mode
2.1 years ago

How I do check whether a read contains substitutions or not. I'm not sure if I should check the flag, the cigar or the sequence.

ADD COMMENTlink
1
Entering edit mode
11 months ago
Freiburg, Germany

Typically this is indicated in the NM auxiliary tag. Since you have C++ in the tags, I'll note that you can retrieve this in htslib via uint8_t *v = bam_aux_get(b, "NM") and then int64_t NM = bam_aux2i(v);.

ADD COMMENTlink
0
Entering edit mode

That's great, what exactly do I have to check in the NM tag?

ADD REPLYlink
0
Entering edit mode

I gave you two C commands that use htslib in my original reply.

ADD REPLYlink
0
Entering edit mode

NM is defined as the edit distance to the reference. Insertions and deletions are included . Most caller do not included soft or hard clipped bases in their calculation (the specs do not define whether they should or not).

ADD REPLYlink
0
Entering edit mode

True, one would also need to parse the CIGAR string and subtract appropriately.

ADD REPLYlink
1
Entering edit mode
11 months ago
d-cameron ♦ 2.0k
Australia

If you are looking exclusively for base substitution for alignments that use M CIGAR operators (as opposed to X and =), then you either have to a) check each base against the reference and calculate mismatches yourself, or b) use the NM tag and adjust for any indels in the alignment.

ADD COMMENTlink
3
Entering edit mode

To add to this answer, A while back a wrote a program, ExpandCigar, that expands the M operator with 'X' or '='. After that, it's easy to extract reads where the CIGAR string contains X.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1