Samtools mpileup: read base column symbols
0
0
Entering edit mode
8.7 years ago
Marvin ▴ 220

I have checked the documenation here: http://samtools.sourceforge.net/pileup.shtml

Here is an output of column 5 (read base column):

samtools view -h bamfile.bam | samtools mpileup -B -f reference.fasta - | awk '{split($5,chars,""); for(i=1;i<=length($5);i++) print chars[i]}' | sort | uniq -c

12864 $
 527 *
 590 +
 236 -
59664 .
 634 1
 130 2
  53 3
   6 4
   3 5
   3 =
6205 A
4793 C
12863 F
4787 G
   7 M
  15 N
6692 T
12863 ^

The F occurs because the mapping quality is always F (thus I always see ^F where a read starts). I find it a little strange that the mapping quality is F for every read but that's actually not my question

[maybe it's because I chose -B ... need to continue reading what that BAQ computation is about ... I noticed that I lose information in the mpileup output if I don't use -B, that's why I use it]

I wonder about these symbols as they are not mentioned in the docs (at least with respect to column 5): * =

The = symbol occurs in only 1 line whereas the * symbols occur in multiple lines. The * symbols also occur in that line where = occurs, so here's an example showing both:

gi_identifier    378981    T    9    .$*+1=$*+1A.*+1=$*+1=$*+1M$*+1M$G+1T    ]]]JJJ]]J

What do these symbols mean?

samtools mpileup • 3.3k views
ADD COMMENT
1
Entering edit mode

Make sure to refer to the correct documentation: http://www.htslib.org/doc/samtools.html

ADD REPLY
0
Entering edit mode

Oh and I am also wondering about that "+1M" ... ?

ADD REPLY
0
Entering edit mode

It does look weird to me as well. What aligner produced this bam file? Maybe post the sam header.

ADD REPLY
0
Entering edit mode

The bam file in the command is the result of a Geneious function that lets you extract the SNP columns of an alignment/bam:

https://support.geneious.com/entries/22583802-Exporting-snp-data-for-phylogenetic-analysis

When I use the same command on the original bam file, I only get expected/documented symbols.

I guess I'll just exclude Geneious from the pipeline and identify the SNP columns myself.

ADD REPLY

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6