correct mark duplicated error using python and pysam to filter out secondary alignments
0
0
Entering edit mode
5.0 years ago
bioguy24 ▴ 230

I am new to python and trying to learn. The below is an attempt to filter out secondary reads in a all bam files in a directory using pysam. I ran mark duplicates and got an error on several bam files due to secondary alignments. I added comments to each line to illustrate my thought process. I think that when I algin using BWA-MEM with the -M option and there are readpairs with the same valid primary alignments and by removing the lower score alignments I will be ok. Is that correct or is there a better approach? Thank you :)

The specific mark duplicates error:

Value was put into PairInfoMap more than once

Attempt 1

#! /usr/bin/python  ## call python script
import sys ## import python system functions
import pysam  ## import module

bam = pysam.AlignmentFile(.bam, "rb")  ## open bam and read
click.echo("Reading BAM file")  ## output message
for read in bam.fetch():
    if read.is_secondary=true  ## not the primary alignment
        continue
    if read.has_tag('AS') and read.has_tag('XS') ## look for Primary and Secondary Alignment Score tag
        AS = read.get_tag('AS')  ## read and store AS value
        XS = read.get_tag('XS')  ## read and store XS value
             if AS >= XS  ## Alignment score greater then or equal to XS (these will be printed)
        bam.write(read)  ## open bam and only print primary alignments
             continue  ## process next line
bam.close()  ## end processing

Attempt 2

#! /usr/bin/python  ## call python script
import sys ## import python system functions
import pysam  ## import module

bam = pysam.AlignmentFile(bam, "rb")  ## open bam and read
click.echo("Reading BAM file")  ## output message
for read in bam.fetch():  ## start loop and iterate over each bam
    if read.is_secondary:  ## not the primary alignment
        continue
    if read.has_tag('AS') and read.has_tag('XS') ## look for Primary and Secondary Alignment Score tag
        AS = read.get_tag('AS')  ## read and store AS value
        XS = read.get_tag('XS')  ## read and store XS value
            if AS >= XS  ## Alignment score greater then or equal to XS (these will be printed)
    read.write(read)  ## only print primary alignments
    continue
bam.close()  ## end processing
python pysam • 1.9k views
ADD COMMENT
0
Entering edit mode

I have re-mapped and markduplicates seems ok but I am curious if the python script is close? Thank you :).

ADD REPLY

Login before adding your answer.

Traffic: 1472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6