The "Read Bases" Column In Samtools Mpileup
1
1
Entering edit mode
12.6 years ago
dustar1986 ▴ 380

Hi,

I read the description of "mpileup" from samtools manual. It says "In the pileup format (without -uor-g), each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases... Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column....Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column. At this column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand"

Then the 4th column should be "read bases" column, if I understand it correctly. When I check this using:

samtools mpileup input.bam|cut -d '    ' -f 4

I saw there were only integers in this column, instead of dots or commas which stands for strand information respectively. Are they represented in a flag way?

So how can I dig out strand information (forward strand or reverse one) from them?

Thank you very much.

mpileup strand • 3.9k views
ADD COMMENT
5
Entering edit mode
12.6 years ago

From http://samtools.sourceforge.net/pileup.shtml;

  • The fourth column is the number of reads covering the site.
  • The 5th column: a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, ACGTN' for a mismatch on the forward strand andacgtn' for a mismatch on the reverse strand (...)
ADD COMMENT
0
Entering edit mode

er...I see. THX

ADD REPLY

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6