bedtools coverage bed
1
0
Entering edit mode
6.3 years ago
bioguy24 ▴ 230

The bedtools 2.25 command I use for coveregeBed is:

coverageBed -d -g genomefile.txt -a target.bed -b input.bam > output.txt

That works for the vast majority of bam files, however when each bam exceeds 60 million mapped reads (~60GB) the run errors. I have been reading about recent memory leak fixes and plan to upgrade to bedtools 2.27. Is there a better command to use or is upgrading enough and re-running? Thank you :).

NGS bedtools • 5.4k views
ADD COMMENT
1
Entering edit mode

My approach in this case, assuming I didn't have better hardware available, would be to extract the regions of the bam using the bedfile and samtools view -L. I would then supply that extracted BAM to bedtools. That will give you a much smaller BAM and may help avoid the memory issues you're seeing. I would spot-check a few of the regions in IGV to be 100% sure I'm not losing reads at the interval boundaries.

ADD REPLY
0
Entering edit mode

One "hack" (I hesitate to call it a solution) is to use samtools depth. This command will print out the number of reads spanning a nucleotide for a genome or a given region (samtools depth in.bam -r chr1:1-1000000). However, samtools depth does not give you the same output as coverageBed, but will give you coverage estimates

ADD REPLY
0
Entering edit mode
6.3 years ago

If target.bed is three-column, you could do something like:

$ bedmap --echo --count --bases --echo-ref-size target.bed <(bam2bed input.bam) | awk '{ print $0"\t"($5/$6); }' > answer.bed

The --echo option reports the elements in target.bed.

The --count option reports the number of features in the map file (the BAM file) that overlap the reference file (the target.bed file) by one or more bases.

The --bases option reports the number of bases that overlap between target and BAM reads.

The --echo-ref-size reports the length of the target element (for each target element).

Piping to awk does the same calculations as coverageBed.

This should ramp up to whole-genome scale work.

To make things faster, if the BAM file is indexed, you can add --do-not-sort and --reduced options to bypass sorting with sort-bed and to only print a subset of columns as BED.

ADD COMMENT

Login before adding your answer.

Traffic: 2932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6