The title is not very clear. This is kind of two questions...
Basically, I want to do things like discover the statistical information from a BAM (not SAM) file (so it's compressed. Part of the problem).
The immediate need is; I'd like to get an average depth for the entire file.
The larger question is...
To get #1, I used...
samtools depth my.bam | while read A B C; do
<tally, and sum, and average stuff>
done
echo <results>
It took FOREVER!!
I was thinking I don't need to average EVERY line. Even if I took every 1000th or 10,000th line, I will get a good enough estimate.
BUT, the issue is - since I must first run the BAMfile through samtools view
using things like awk, sed, or even the file pointer to try and pull JUST the 1000th line actually takes longer than the above.
Is there a way around this?
Thanks!
EDIT
I'm vaguely familiar with samtools stats
, but it didn't seem to have the info I was looking (specifically) above. If there's a way to tweak it to get what I need...that's awesome.
Have you looked at mosdepth?