Question

Faster way to get coverage per base

3

Entering edit mode

7.1 years ago

finswimmer 16k

Hello,

I would like to get the coverage for every single base in all regions in a bed file. For this I use samtools now

samtools depth -aa -b regions.bed input.bam

This all works as excpected. But the more regions I have the longer this tasks takes of course. I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete. My whole analyse pipeline from fastq to an annotated vcf file runs only 1h for this data set.

So I wonder if there is a faster way for this task?

fin swimmer

alignment sam • 4.8k views

ADD COMMENT • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Hi , If you have enough resources you can split your bam for example by chromosome and run samtools for each splited bam ?

ADD REPLY • link 7.1 years ago by Titus ▴ 910

0

Entering edit mode

Should there be any reasonable differences to split the bam file and not the bed file with the regions of interests?

I thought with the bam index it should be possible to have random access directly to the regions?

fin swimmer

ADD REPLY • link 7.1 years ago by finswimmer 16k

0

Entering edit mode

The BBMap package has a tool "pileup.sh" which is quite fast, and produces per-base coverage from an unsorted sam or bam file. I'm not sure if this answers your question, though, as it's not clear to me whether you want bed as input or output, or need bed at all.

ADD REPLY • link 7.1 years ago by Brian Bushnell 20k

0

Entering edit mode

pileup.sh seems to need lot of memory which I have to little (around 6GB) :(

The bed file is just needed to truncate the bam file to the region of interest. If I can pipe to pileup I can do this before with samtools view. So can I? Didn't find it in the documentation.

fin swimmer

ADD REPLY • link 7.1 years ago by finswimmer 16k

0

Entering edit mode

May be i didn't explain well my idea. You can split your bed and run multiple samtools in the same times. It will divide the calculation time in a lot little job ... Well you should try i m not sure about that but it make sens for me.

ADD REPLY • link 7.1 years ago by Titus ▴ 910

0

Entering edit mode

Hello Titus, this is what I meant with

I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete.

:)

ADD REPLY • link 7.1 years ago by finswimmer 16k

0

Entering edit mode

sorry my bad i read too fast :)

ADD REPLY • link 7.1 years ago by Titus ▴ 910

0

Entering edit mode

Pileup will only allocate memory for chromosomes that actually have reads mapped to them, so yes.

ADD REPLY • link 7.1 years ago by Brian Bushnell 20k

3

Entering edit mode

5.3 years ago

finswimmer 16k

To add in up-to-date solution:

mosdepth is great:

$ mosdepth --by capture.bed sample-output sample.exome.bam

fin swimmer

ADD COMMENT • link 5.3 years ago by finswimmer 16k

1

Entering edit mode

7.1 years ago

Vivek Todur ▴ 60

Sambamba is high performance tool, similar to samtools. Here depth is multi threaded, you can take advantage of it...

Cheers...!

ADD COMMENT • link 7.1 years ago by Vivek Todur ▴ 60

0

Entering edit mode

Never heard about sambamba before, but I will try it.

fin swimmer

ADD REPLY • link 7.1 years ago by finswimmer 16k

0

Entering edit mode

Did you get chance to try sambamba..? BTW latest version of samtools (1.4) got released with multi thread option... Just check out out...

ADD REPLY • link 7.0 years ago by Vivek Todur ▴ 60

0

Entering edit mode

I didn't try sambamba until now.

But I recognize that picards CollectHsMetrics have an PER_BASE_COVERAGE option. As I'm running CollectHsMetrics in any case I could use this option.

fin swimmer

ADD REPLY • link 7.0 years ago by finswimmer 16k

score 4 · Accepted Answer · 2017-03-24

4

Entering edit mode

7.1 years ago

Jeffin Rockey ★ 1.3k

Try bedtools coverage with -d option (link)

(And do notice the first note on -sorted)

ADD COMMENT • link 7.1 years ago by Jeffin Rockey ★ 1.3k

0

Entering edit mode

I'used bedtools before, but put it aside because of high memory usage (I just have have 6GB). But with the -sorted parameter it seemes to work at least with my small test dataset. I will will check it today on my large dataset.

Thank you.

fin swimmer

ADD REPLY • link 7.1 years ago by finswimmer 16k

1

Entering edit mode

Could test it yesterday and it works with the -sorted option much faster than with samtools.

Thanks a lot.

fin swimmer

ADD REPLY • link 7.1 years ago by finswimmer 16k