Produce Bed File With Gaps In Coverage From A Bam File
3
0
Entering edit mode
10.8 years ago

What options are there to produce a bed file with every entry being a start-end segment of the reference with a gap in coverage from a bam file?

bam • 5.8k views
ADD COMMENT
2
Entering edit mode
10.3 years ago

If you create a BEDGRAPH of coverage from your BAM file using the bedtools genomecov tool, you can simply filter for regions with 0 (or whatever minimum value) as follows:

bedtools genomecov -ibam my.sorted.bam -bg > my.sorted.bedgraph
awk '$4 < 1' my.sorted.bedgraph > my.uncovered.intevals.bedgraph
ADD COMMENT
0
Entering edit mode

that's exactly what we do to detect poorly covered regions (and it works quite fast I must add). you may want to intersect that final bedgraph with your design bed file to get only the designed regions that were not covered enough.

ADD REPLY
1
Entering edit mode
10.8 years ago
Matt LaFave ▴ 310

You could try bedtools, though you'd need to do a little intermediate processing. Use bedtools bamtobed to convert the bam file into bed format, then use bedtools complement to identify the intervals that weren't covered in the original bam file.

ADD COMMENT
1
Entering edit mode
10.8 years ago

First, to demonstrate the idea, make a sorted reference BED file. Let's say you're working with human data (hg19 build):

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" \
    | awk '{print $1"\t0\t"($2+1)}' \
    | tail -n +2 \
    | sort-bed - \
    > hg19.ref.bed

$ head hg19.ref.bed
chr1     0    249250622
chr10    0    135534748
chr11    0    135006517
...

(You can use any sorted reference coordinates you want — I am using sorted coordinates for the hg19 reference genome for demonstration purposes.)

Next, convert your BAM file to BED:

$ bam2bed < myReads.bam > myReads.bed

Finally, run the reads and reference files through BEDOPS bedops --partition and bedops --not-element-of to extract regions-of-interest:

$ bedops --partition hg19.ref.bed myReads.bed \
    | bedops --not-element-of -1 - myReads.bed \
    > answer.bed

The file answer.bed will be a sorted BED file that contains regions from the reference file hg19.ref.bed, with gaps where the BAM-sourced reads are located.

(Briefly, the way this works is that the --partition operation splits the reference and read regions into disjoint subsets. The --not-element-of operation filters out disjoint subsets that are specifically from the reads set.)

ADD COMMENT
0
Entering edit mode

this would work if "gap" means no coverage at all. if "gap" is understood as "target regions without enough coverage", which would include those without coverage, then you would have to extract the coverage first in bedgraph format for instance, and then parse that bedgraph file looking for positions not reaching the threshold of interest.

ADD REPLY

Login before adding your answer.

Traffic: 1605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6