I've mapped some reads to an assembly using bwa mem
and now I'd like to visualise the read depth to see which areas of the genome have the most coverage.
I used samtools depth
on the resulting .bam file to pull out the per-base read frequencies and it's created a file with the following format:
Scaffold Position freq.
K493scaffold_1 9341 28
K493scaffold_1 9342 28
K493scaffold_1 9343 28
K493scaffold_1 9344 28
K493scaffold_1 9345 28
K493scaffold_1 9346 28
K493scaffold_1 9347 28
K493scaffold_1 9348 28
K493scaffold_1 9349 28
K493scaffold_1 9350 28
K493scaffold_1 9351 28
K493scaffold_1 9352 1
K493scaffold_1 10273 1
K493scaffold_1 10274 188
K493scaffold_1 10275 189
K493scaffold_1 10276 189
K493scaffold_1 10277 189
K493scaffold_1 10278 189
K493scaffold_1 10279 189
K493scaffold_1 10280 189
K493scaffold_1 10281 189
K493scaffold_1 10282 189
K493scaffold_1 10283 189
K493scaffold_1 10284 189
I could try plotting the entire file; however, it's pretty large (so most programs wouldn't be able to handle it) and there's 1366 scaffolds, each of which contains ~30kb of positions. So obviously this would be a pain to navigate.
So now, for each scaffold, I'd like to bin the base positions into 500bp sections and take an average of the frequency for each bin. For example, a disired output would be something like this:
K493scaffold_1 1-500 28
K493scaffold_1 501-1000 71
K493scaffold_1 1001-1500 98
K493scaffold_1 1501-2000 2
K493scaffold_1 2001-2500 17
I was wondering if there's any utility out there which can already do what I'm asking before I embark on writing a script myself?
I've already tried bedtools genomecov
BEDGRAPH output but it's not quite doing what I'm looking for as it's not sorting the data into regular sized bins.
Thanks in advance for any help anyone can provide!
Hi, short on time, but i think that this may help: C: Bin chromosome every 1kb and get average value