Question

Better way for speedup of Humann3 metagenomics analysis pipeline?

0

Entering edit mode

3.8 years ago

boaty ▴ 220

Hi guys,

This is an open question.

While dealing with microbiome metagenomics data by humann3, I found that it is quite time-consuming. With 40-core cups and 240 GB MEM, single sample analysis took 4 hours for nucleotide-searching only and 9 hours for translated-searching included and this is inacceptable for a institute server.

After some tests, I noticed that disk transmission speed is probably the choking point.

Part of the reason is that the Humann3 pipeline is a collection of separated tools. Humann3 pipeline takes input file, analyzes it, outputs/writes it on disk and then the second tool takes this output as input and then goes on. The consequence is that process waited long time for HDD reading and writing which slows down the whole analysis speed.

The first thing I was thinking is a m.2 SSD, but it is even slower because of administration paper work. So is there any better way by software or disk management for the pipeline speedup?

Thank in advance

alignment metagenomics humann pipeline • 2.7k views

ADD COMMENT • link updated 3.2 years ago by Raygozak ★ 1.4k • written 3.8 years ago by boaty ▴ 220

score 1 · Answer 1 · 2021-02-05

HI, in my experience the most time-consuming part is when it uses diamond on the reads that did not match against anything in the prescreen. Diamond has a parameter that sets the block size for processing reads (-b), which is a multiple of millions of reads to be processed, the default is 2.0, of course, uses more or less memory depending if the value is small or large. Humann per see does not have a way to pass this as an argument so you might need to go into the code and change it manually or add the functionality to accept this parameter. The larger the number the more memory it uses, and you get shorter runtimes, and the smaller the less memory it uses but takes longer.

This is useful if you don't have enough memory but have time, then you can set it to a low value, and vice versa.

Hope this helps.