What is the limiting factor for trimmomatic speed and how can it be increased?
2
2
Entering edit mode
7.1 years ago
Daniel ★ 4.0k

I'm using trimmomatic mainly to filter out adapters in the read through of my paired end illumina data.

My command is as follows, and produces expected results:

java -jar trimmomatic-0.33.jar PE 01_R1.fastq 01_R2.fastq 01_R1-trimpair.fastq 01_R1-trimunpair.fastq 01_R2-trimpair.fastq 01_R2-trimunpair.fastq ILLUMINACLIP:TruSeq3-PE.fa:2:20:7 TRAILING:3 MINLEN:36

However, I can't work out how to direct how many nodes to use (the word node or core doesn't exist in the trimmomatic manual) Edit: Found it under -threads. When run I am shown a message:

Multiple cores found: Using 16 threads

However, I have more available as I am submitting these jobs to a large compute cluster. If I assign 2 cores or 16 or 32, I still get the same message.

Finally, testing on one sample completed in 1000min wall time assigned to it (16 cores) and so I submitted the full 16 samples to the compute queue, but each job failed at ~50% completion as it timed out after 1000min. This makes me wonder if it's being limited by memory constraints, which when running alone it was able to inflate but with the parallel jobs running perhaps competed and slowed down. But that's speculation, I don't know if it works like that. Alternatively, could it be java that's limiting mem and I should push it higher with -Xmx?

Alternatively, I'm not tied to trimmomatic and would use a different illumina adaptor filter if anyone could recommend one.

Thanks.

trimmomatic fastq • 12k views
ADD COMMENT
1
Entering edit mode

I wonder if the limiting factor wouldn't be the I/O (reading/writing the fastq files) rather than the CPU (processing the reads) or the memory. I don't know for sure so it would be nice if someone could confirm this.

ADD REPLY
3
Entering edit mode
7.1 years ago

BBDuk is substantially faster than Trimmomatic (and, in my testing, more accurate for adapter-trimming). With 16 cores, it can adapter-trim over 1 million 150bp paired-end reads per second on 2.5 GHz Intel E5-2670 CPUs, using recommended parameters.

E.G.:

bbduk.sh in=/dev/shm/r#.fq reads=4m ktrim=r k=23 mink=11 hdist=1 t=16 ref=adapters_a2.fa tbo tpe out=foo.fq

BBDuk version 37.02
Set threads to 16

Initial:
Memory: max=46902m, free=44944m, used=1958m

Added 7767 kmers; time:         0.225 seconds.
Memory: max=46902m, free=42497m, used=4405m

Input is being processed as paired
Processing time:                3.517 seconds.

Input:                          4000000 reads           604000000 bases.
KTrimmed:                       10626 reads (0.27%)     1176820 bases (0.19%)
Trimmed by overlap:             1658 reads (0.04%)      25632 bases (0.00%)
Total Removed:                  6422 reads (0.16%)      1202452 bases (0.20%)
Result:                         3993578 reads (99.84%)  602797548 bases (99.80%)

Time:                           3.755 seconds.
Reads Processed:       4000k    1065.30k reads/sec
Bases Processed:        604m    160.86m bases/sec
ADD COMMENT
0
Entering edit mode

Fantastic, I'll give this a go!

ADD REPLY
2
Entering edit mode
7.1 years ago

So you tried -threads with 2, 16 and 32? Got same results? Trimmomatic analyses each read separately (or each pair with paired end settings) and analyzed reads do not affect trimming or subsequent reads if I remember correctly. Then to avoid I/O problem you can slice your file into chunks of N reads and send it to K nodes in a batch, then wait for everything to be processed and combine results. It does sound like trimmomatic is not super efficient in parallelization on your cluster for some reason.

ADD COMMENT
0
Entering edit mode

I was going to suggest slicing the input fastq's. However, the slicing & merging makes the whole processing more complicated and error prone and I wonder if it's worth. If time is crucial I would consider also/instead piping the output of the trimmer to the aligner. I put simple howto here Trim & align paired-end reads in a single pass using cutadapt and bwa mem.

ADD REPLY

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6