I am trying to extract per-base read mapping coverage from a BAM file based on several regions of interest in a BED file. For example, I want to get the coverage at each base pair for the following 1kb regions:
chr start end name score strand
chrI 182604 183604 region1 0 -
chrII 197700 198700 region2 0 -
chrII 326866 327866 region3 0 -
chrII 643079 644079 region4 0 +
chrIII 82534 83534 region5 0 +
My current approach is to use:
coverageBed -a BED -b BAM -g GENOME_FILE -d -s > coverage.txt
I know what the coverage is "supposed" to look like (I'm trying to replicate a graph from a paper), and this command gives me something close but not exact. However, I can recreate it perfectly using the (much slower) process of calculating coverage for every base pair of the ENTIRE genome, then extracting specific regions using awk/grep scripting, as long as I use the "-fs" (fragment size) parameter set to 20 in genomecoverageBED:
genomecoverageBed -ibam BAM -d -strand + -fs 20 > coverage_pos.txt
genomecoverageBed -ibam BAM -d -strand - -fs 20 > coverage_neg.txt
...
a bunch of unix data wrangling to filter to only my regions of interest
My question is not about helping me solve this problem, I simply want to know: what exactly is the "-fs" parameter doing in genomecoverageBed
and why isn't it an option with coverageBed
?
Thanks for your simple answer. So, in my case from above, if I have a read that is 50 bp long it will be trimmed to 20 bp, and if I have a read that is 10bp it will be extended to 20bp in order to calculate final per-bp coverage?