Biostar Beta. Not for public use.
Median coverage from BAM file
1
Entering edit mode
12 months ago
rmf • 630

Any tool that can do median per base coverage over pre-defined windows from a BAM file?

I could make the actual per base coverage using samtools depth or genomeCoverageBed.

chr pos read_depth
1   1   6
1   2   8
...

I already have the windows prepared using

bedtools makewindows \
    -g "genome.fa.fai" \
    -w 1000 > "windows.bed"

I am looking for a result like below:

Median per base coverage over 1000bp windows

chr start  end   median_read_depth
1   1      1000  5
1   1000   2000  8
...
coverage DNA-Seq • 1.2k views
ADD COMMENTlink
1
Entering edit mode
13 months ago
France/Nantes/Institut du Thorax - INSE…

I 've written : http://lindenb.github.io/jvarkit/BamStats04.html

$ java -jar dist/bamstats04.jar -B src/test/resources/toy.bed.gz src/test/resources/toy.bam 2> /dev/null | column -t 

#chrom  start  end  length  sample  mincov  maxcov  meancov  mediancov  nocoveragebp  percentcovered
ref     10     13   3       S1      3       3       3.0      3.0        0             100
ref2    1      2    1       S1      2       2       2.0      2.0        0             100
ref2    13     14   1       S1      6       6       6.0      6.0        0             100
ref2    16     17   1       S1      6       6       6.0      6.0        0             100
ADD COMMENTlink
0
Entering edit mode

Oh that's brilliant! Includes all sorts of summaries.

ADD REPLYlink
0
Entering edit mode

reformat.sh from BBMap suite also produces summaries of all sorts. Look at the histogram parameters.

ADD REPLYlink
0
Entering edit mode

I have tried bbmap pileup and it was great. It's blazing fast. Unfortunately, it only gives the total number of mapped reads (sum) over windows. It would've been awesome if it could provide other summary metrics (mean, median etc) as well. I am not familiar with reformat.sh, but I will look into it.

ADD REPLYlink
0
Entering edit mode
14 months ago
Seattle, WA USA

Using BEDOPS bedmap and bam2bed, you can calculate a BED file containing reference elements (genomic windows over which you want to calculate coverage) and a semi-colon-delimited list of read overlap sizes in the final column:

$ bedmap --echo --echo-overlap-size --delim '\t' windows.bed <(bam2bed reads.bam) > coverage.bed

Once you have this, you can use Unix tools like paste and awk to paste the first through N-1 columns together with the median read overlap size calculated from the final, Nth column:

$ paste <(awk '{ for (i=1; i<(NF-1); i++) { printf("%s\t", $i); } printf("%s", $(NF-1)); }' coverage.bed) <(awk '{ n=split($NF, a, ";"); n=asort(a); if (n % 2) { printf("%d", a[(n + 1) / 2]); } else { printf("%d", a[(n / 2)] + a[(n / 2) + 1]) / 2.0); } }' coverage.bed) > answer.bed

In answer.bed, the first through N-1 columns contain the window interval, and the last column contains the median overlap size.

You can put in as many metrics here as you want, mean, etc. just by adjusting the second awk command to print those statistics.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1