How to compute target aligned reads/total aligned reads (read enrichment) with Python?
0
0
Entering edit mode
5.9 years ago
vw ▴ 40

edit: Thanks for Devon Ryan's help. I solved this problem.

I am a green hand of bioinfomatics. I am trying to apply pybedtools and pysam to compute the read enrichment. I have a BAM file and a BED file. The BED records target regions. My output is over 80%. I feel it is too high. Since I also computed the percentage of bases enriched in the target regions which is just about 57%(Of course, it could be wrong also). Could you please help me to double-check my code and result? Thanks! PS: Sorry for my poor English.

Attached please find my data: https://www.google.com/url?q=https://www.dropbox.com/s/c2768qj1n2flcfg/TEST1.zip?dl%3D0&sa=D&ust=1527459500133000&usg=AFQjCNHqoYhvNp3OObNdyivI9I0sr8IT3w

My code for read enrichment:

import pybedtools as bd
target = bd.BedTool(path_bed)
bam = bd.BedTool(path_bam)
bam = bam.bam_to_bed(stream=True)
target = target.remove_invalid().saveas()
temp = bam.remove_invalid().saveas()

ontag_reads = temp.coverage(target, counts=True).sort()        ## Reads mapped on target regions
read_count = np.array([int(line[-1]) for line in ontag_reads])    ## Sum of reads on target regions

Per_ontag = read_count.sum()/samraw.mapped*100                 ## Percentage of reads enriched in the target regions

print('Percentage of reads enriched in the target regions: {0:.2f}%'.format(Per_ontag))

My code of bases enriched(Total Aligned Bases in Targeted Regions/Total Aligned Bases) :

samraw = py.AlignmentFile(path_bam, 'rb')

read_tag = target.coverage(path_bam, d=True)

ontag_base = np.array([int(base[5]) for base in read_tag])      ## Bases mapped on target
read_base = np.array([line.query_length for line in samraw])    ## Total mapped bases                                                ##the length of the read.


Per_base = ontag_base.sum()/read_base.sum()*100
print('Percentage of bases enriched in the target regions: {0:.2f}%'.format(Per_base))
pybedtools pysam bam bed • 2.5k views
ADD COMMENT
0
Entering edit mode

You could save yourself the hassle and use plotEnrichment from deepTools. I wrote it for this purpose.

ADD REPLY
0
Entering edit mode

Thanks Devon. I tried plotEnrichment. It proves my output is correct. Thanks a lot.

ADD REPLY
0
Entering edit mode

A quick question: I want to count the total aligned bases in BAM. Is there some way to do that? Do I just need to sum the length of each alignment in BAM?

ADD REPLY
0
Entering edit mode

You can basically get that from plotFingerprint. 1 - the x intercept is the fraction of covered bases.

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6