Question

The "Read Bases" Column In Samtools Mpileup

1

Entering edit mode

12.6 years ago

dustar1986 ▴ 380

Hi,

I read the description of "mpileup" from samtools manual. It says "In the pileup format (without -uor-g), each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases... Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column....Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column. At this column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand"

Then the 4th column should be "read bases" column, if I understand it correctly. When I check this using:

samtools mpileup input.bam|cut -d '    ' -f 4

I saw there were only integers in this column, instead of dots or commas which stands for strand information respectively. Are they represented in a flag way?

So how can I dig out strand information (forward strand or reverse one) from them?

Thank you very much.

mpileup strand • 3.9k views

ADD COMMENT • link updated 12.6 years ago by Pierre Lindenbaum 161k • written 12.6 years ago by dustar1986 ▴ 380

score 5 · Answer 1 · 2011-10-04

5

Entering edit mode

12.6 years ago

Pierre Lindenbaum 161k

From http://samtools.sourceforge.net/pileup.shtml;

The fourth column is the number of reads covering the site.
The 5th column: a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, ACGTN' for a mismatch on the forward strand andacgtn' for a mismatch on the reverse strand (...)