Question

Median coverage from BAM file

1

Entering edit mode

6.3 years ago

firestar ★ 1.6k

Any tool that can do median per base coverage over pre-defined windows from a BAM file?

I could make the actual per base coverage using samtools depth or genomeCoverageBed.

chr pos read_depth
1   1   6
1   2   8
...

I already have the windows prepared using

bedtools makewindows \
    -g "genome.fa.fai" \
    -w 1000 > "windows.bed"

I am looking for a result like below:

Median per base coverage over 1000bp windows

chr start  end   median_read_depth
1   1      1000  5
1   1000   2000  8
...

coverage DNA-Seq • 4.2k views

ADD COMMENT • link 5.2 years ago by firestar ★ 1.6k

0

Entering edit mode

6.3 years ago

Alex Reynolds 35k

Using BEDOPS bedmap and bam2bed, you can calculate a BED file containing reference elements (genomic windows over which you want to calculate coverage) and a semi-colon-delimited list of read overlap sizes in the final column:

$ bedmap --echo --echo-overlap-size --delim '\t' windows.bed <(bam2bed reads.bam) > coverage.bed

Once you have this, you can use Unix tools like paste and awk to paste the first through N-1 columns together with the median read overlap size calculated from the final, Nth column:

$ paste <(awk '{ for (i=1; i<(NF-1); i++) { printf("%s\t", $i); } printf("%s", $(NF-1)); }' coverage.bed) <(awk '{ n=split($NF, a, ";"); n=asort(a); if (n % 2) { printf("%d", a[(n + 1) / 2]); } else { printf("%d", a[(n / 2)] + a[(n / 2) + 1]) / 2.0); } }' coverage.bed) > answer.bed

In answer.bed, the first through N-1 columns contain the window interval, and the last column contains the median overlap size.

You can put in as many metrics here as you want, mean, etc. just by adjusting the second awk command to print those statistics.

ADD COMMENT • link 6.3 years ago by Alex Reynolds 35k

score 2 · Accepted Answer · 2018-01-15

2

Entering edit mode

6.3 years ago

Pierre Lindenbaum 161k

I 've written : http://lindenb.github.io/jvarkit/BamStats04.html

$ java -jar dist/bamstats04.jar -B src/test/resources/toy.bed.gz src/test/resources/toy.bam 2> /dev/null | column -t 

#chrom  start  end  length  sample  mincov  maxcov  meancov  mediancov  nocoveragebp  percentcovered
ref     10     13   3       S1      3       3       3.0      3.0        0             100
ref2    1      2    1       S1      2       2       2.0      2.0        0             100
ref2    13     14   1       S1      6       6       6.0      6.0        0             100
ref2    16     17   1       S1      6       6       6.0      6.0        0             100

ADD COMMENT • link 6.3 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

Oh that's brilliant! Includes all sorts of summaries.

ADD REPLY • link 6.3 years ago by firestar ★ 1.6k

0

Entering edit mode

reformat.sh from BBMap suite also produces summaries of all sorts. Look at the histogram parameters.

ADD REPLY • link 6.3 years ago by GenoMax 141k

0

Entering edit mode

I have tried bbmap pileup and it was great. It's blazing fast. Unfortunately, it only gives the total number of mapped reads (sum) over windows. It would've been awesome if it could provide other summary metrics (mean, median etc) as well. I am not familiar with reformat.sh, but I will look into it.

ADD REPLY • link 6.3 years ago by firestar ★ 1.6k