samtools multithreads
1
0
Entering edit mode
3.1 years ago
Palgrave ▴ 110

I am running samtools view by specifying multiple threads, however, the CPU never goes above 100. My server has more than 1 CPU, so why isn't using more CPUs?

samtools  view --threads 20 --reference GRCh37_latest_genomic.fna -O bam -f 4 file1.cram > file1.unmapped.bam
samtools • 9.9k views
ADD COMMENT
0
Entering edit mode

Use the correct option for threads as indicated below. Hopefully you are doing this on a server that has a high performance file system. Normally I/O is going to be the limiting factor for operations with a large number of threads.

ADD REPLY
0
Entering edit mode

For CRAM/BAM transcoding jobs such as this example, CPU utilisation is significant. You're generally only going to I/O bound if you're streaming from a remote server, or are very heavily multithreading samtools. That example is almost certainly CPU bound. My guess is it's running at ~5MB/s (a.k.a. ~40Mbps) and will take several hours to complete on typical human WGS data (hence the question here).

See http://www.htslib.org/benchmarks/CRAM.html for a high-level overview of CRAM/BAM encoding/decoding costs.

ADD REPLY
0
Entering edit mode

using 7CPUs reduced the time for a 100GB cram file to about 30-45mins.

ADD REPLY
3
Entering edit mode
3.1 years ago

The only option I know of is -@ for multiple compression threads

ADD COMMENT

Login before adding your answer.

Traffic: 2315 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6