pysam pileup: what reads appear in the pileup?
1
0
Entering edit mode
5.8 years ago
aostern ▴ 30

Cross-posted on Bioinformatics.SE. I hope that's OK. Answer was provided there. Short version: pysam pileup() just wraps samtools mpileup, and mpileup by default filters out reads not just with a base quality—but also with a "base alignment quality"—lower than 13.


I thought that iff a read matches the reference at a position, it would appear in the pileup column for that position. But tracking a single read in pysam shows that that's not the case.

This read appears in some pileup columns for positions to which it does not map, and does not appear in some pileup columns to which it does map.

What's wrong in my understanding?

#open BAM
bamPath = "/path/to/example.bam"
bamData = pysam.AlignmentFile(bamPath, "rb")

#position we want to find reads for
chrom = "5"
pos = 345515
windowStart = 340001
windowEnd = 350001

#read to look out for
idForReadOfInterest = "E00489:44:HNNVYCCXX:1:2120:26524:19135"

def idForRead(read):
        return read.tostring().split("\t")[0]

#extract the reads with pileup
columnRPs = []
readOfInterest = None
for column in bamData.pileup(chrom, windowStart, windowEnd):
        for pRead in column.pileups:
                read = pRead.alignment

                if idForRead(read) == idForReadOfInterest:
                        readOfInterest = read
                        columnRPs.append(column.reference_pos)

print("read appears in pileup columns for these {} positions".format(len(columnRPs)))
print(columnRPs)

alignedPairs = readOfInterest.get_aligned_pairs()
alignedPositions = [x[1] for x in alignedPairs]

#filter to take only positions where read actually matches ref
matchPositions = []
for block in readOfInterest.get_blocks():
        for p in alignedPositions:
                if p >= block[0] and p < block[1]: #inclusive, exclusive?
                        matchPositions.append(p)
print("read matches at these {} positions".format(len(matchPositions)))
print(matchPositions)

read appears in pileup columns for these 167 positions [345247, 345248, 345249, 345250, 345251, 345252, 345253, 345254, 345255, 345256, 345257, 345258, 345259, 345260, 345261, 345262, 345263, 345264, 345265, 345266, 345267, 345268, 345269, 345270, 345271, 345272, 345273, 345274, 345275, 345276, 345277, 345278, 345279, 345280, 345281, 345282, 345283, 345284, 345285, 345286, 345287, 345288, 345289, 345290, 345292, 345294, 345445, 345446, 345447, 345448, 345449, 345451, 345452, 345453, 345454, 345455, 345456, 345457, 345458, 345459, 345460, 345462, 345463, 345464, 345465, 345466, 345469, 345470, 345473, 345474, 345475, 345476, 345478, 345481, 345482, 345485, 345487, 345490, 345492, 345495, 345497, 345498, 345499, 345500, 345501, 345502, 345503, 345504, 345505, 345506, 345507, 345508, 345509, 345510, 345511, 345513, 345518, 345523, 345524, 345525, 345527, 345528, 345529, 345530, 345532, 345533, 345534, 345535, 345536, 345537, 345538, 345539, 345540, 345542, 345544, 345545, 345546, 345547, 345548, 345549, 345550, 345551, 345552, 345553, 345554, 345555, 345556, 345557, 345558, 345559, 345560, 345561, 345562, 345563, 345564, 345565, 345566, 345567, 345568, 345569, 345570, 345571, 345572, 345573, 345574, 345575, 345576, 345577, 345578, 345579, 345580, 345581, 345582, 345583, 345584, 345585, 345586, 345587, 345588, 345589, 345590, 345591, 345592, 345593, 345594, 345595, 345596]


read matches at these 150 positions [345445, 345446, 345447, 345448, 345449, 345450, 345453, 345454, 345455, 345456, 345457, 345458, 345459, 345460, 345461, 345462, 345463, 345464, 345465, 345466, 345467, 345468, 345469, 345470, 345471, 345472, 345473, 345474, 345475, 345476, 345477, 345478, 345479, 345480, 345481, 345482, 345483, 345484, 345485, 345486, 345487, 345488, 345489, 345490, 345491, 345492, 345493, 345494, 345495, 345496, 345497, 345498, 345499, 345500, 345501, 345502, 345503, 345504, 345505, 345506, 345507, 345508, 345509, 345510, 345511, 345512, 345513, 345514, 345515, 345516, 345517, 345518, 345519, 345520, 345521, 345522, 345523, 345524, 345525, 345526, 345527, 345528, 345529, 345530, 345531, 345532, 345533, 345534, 345535, 345536, 345537, 345538, 345539, 345540, 345541, 345542, 345543, 345544, 345545, 345546, 345547, 345548, 345549, 345550, 345551, 345552, 345553, 345554, 345555, 345556, 345557, 345558, 345559, 345560, 345561, 345562, 345563, 345564, 345565, 345566, 345567, 345568, 345569, 345570, 345571, 345572, 345573, 345574, 345575, 345576, 345577, 345578, 345579, 345580, 345581, 345582, 345583, 345584, 345585, 345586, 345587, 345588, 345589, 345590, 345591, 345592, 345593, 345594, 345595, 345596]

pysam sequencing software genome alignment • 3.3k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
aostern ▴ 30

Cross-posted on Bioinformatics.SE. I hope that's OK. Answer was provided there. Short version: pysam pileup() just wraps samtools mpileup, and mpileup by default filters out reads not just with a base quality—but also with a "base alignment quality"—lower than 13.

ADD COMMENT

Login before adding your answer.

Traffic: 3095 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6