How to speed up bwa mem?
1
3
Entering edit mode
6.6 years ago

I have 2 pair-end fastq files, the size is 200GB * 2. Doing "bwa mem" cost me nearly 6 hours on a pretty good machine(24 physical core E5-2670 v3, Hyper-Threading, 64GB memory).

There is some problem with the "-t" param . The timecost of "-t 48" and "-t 12" are nearly the same.

I wonder if it's possible to split fastq file into multi parts,and run "bwa mem" seperately and concurrently? Then combine the sam outputs toghter.

bwa wgs • 8.2k views
ADD COMMENT
0
Entering edit mode

6 hours is pretty awesome in my experience for a file of that size (if the size is the compressed fq, is it?). At some point, increasing -t is probably not beneficial as I/O limitations kick in. You can of course split your fastq into pieces, then later use e.g. SAMtools cat piped into SAMtools sort, but that will in the end take probably longer than just waiting these 6h.

ADD REPLY
0
Entering edit mode

200GB*2 is plain fastq, not compressed.

thank you for your answer, I'll try to split it, and figure out the timecost ^_^

ADD REPLY
0
Entering edit mode

Try something other than bwa? minimap2 was released recently and is supposed to be incredibly fast.

ADD REPLY
0
Entering edit mode

I am also curious about bwa mem mapping rate. Rather than the size of the file, I would specify the number of reads. I subsampled a 150bp paired fastq pair to 1 mil reads and mapped it to the zebrafish genome (half the size of human genome). I used a computing cluster, using 16 cores (each with 8 gb ram , but barely 8 gb was actually used). Mapping rate is also affected by read quality. I have trimmed reads with all bases having phred quality >28.

bwa mem mapping: 1 mil reads in 73 sec (just pure mapping). mapping+samblaster+samtools fixmate+samtools sort (for variant calling workflow): 1 mil reads in 325 sec

For pure mapping, that is mapping around 13600 reads per second. So 200mil reads would take 4 hours. For full workflow, mapping is 3000 reads per sec. So 200mil reads would take 18 hours. (These are just rough estimated values. It is not a simple linear interpolation in practice.)

I wonder there are any sources to compare mapping rate (reads mapped per sec or min) on various system specs.

ADD REPLY
1
Entering edit mode
6.6 years ago

If you can't increase speed by using more than 12 threads on a 24-core node, you are probably I/O limited, in which case no alternative aligner could run faster (unless you are write-limited due to unnecessary fields in the sam output, which you could potentially disable). You can prevent such I/O limitations by keeping your files compressed at all times (for example, gzipped via pigz), and read/write compressed files at every stage of your pipeline. If you run "top" while mapping is running, you will see how much CPU utilization you have; it should be around 4800% while mapping with 48 threads. Hyperthreading does not particularly increase the speed of mapping, though; it's more for floating-point operations, so there is likely no point in exceeding 24 threads anyway.

If you have multiple disks or filesystems, you may be able to increase speed by reading from one disk and writing to another.

ADD COMMENT

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6