Question

Retrieving identical positions from mpileup file

0

Entering edit mode

6.9 years ago

samocarp • 0

Hello all,

I performed a read mapping of one species against a reference genome. I converted the file_sorted.bam to a file.mpileup. Now, I'd like to create a file containing shared (identical) positions between the reference genome and the mapped species. Also, these shared positions must have not only a good mapping quality but also a read depth > 10. Does anyone know the best way to do this?

Thank you!

snp alignment • 1.0k views

ADD COMMENT • link updated 12 months ago by Ram 43k • written 6.9 years ago by samocarp • 0

score 0 · Answer 1 · 2017-07-11

I would suggest (1) using VarScan to identify differences and similarities between reference and your sample or (2) writing a Python script if you know Python. If you use VarScan you could do something like this:

java -Xmx8g -jar VarScan.v2.3.9.jar mpileup2snp file.mpileup --min-coverage 10

If you write a Python script you could just take a look at the pileup file format spec here: http://samtools.sourceforge.net/pileup.shtml. It's pretty easy to identify SNPs (differences between sample and reference) and read depth in a pileup file. In your case since you want sites that are the same between reference and sample you will want straight periods or commas (a match to the reference genome on the forward or reverse strands, respectively) in your fifth column.