Question

How does the memory requirements of abyss scale ?

0

Entering edit mode

6.3 years ago

lieven.sterck 15k

I have been running a few test runs with the abyss assembler on a subset of my input data (to speed up things) in order to optimise the Kmer. I'm now wondering if I can guesstimate the mem requirements of my full run on the mem usage of these trial runs .

More specifically does the mem scale with the input file size or rather with the genome size? I was thinking that eg. doubling my input size will not result in double mem used as the number of distinct Kmer to keep in mem is most likely plateauing?

any help or other user experiences is much appreciated.

thx

abyss Assembly • 1.8k views

ADD COMMENT • link updated 6.3 years ago by benv ▴ 730 • written 6.3 years ago by lieven.sterck 15k

score 3 · Accepted Answer · 2018-01-25

3

Entering edit mode

6.3 years ago

benv ▴ 730

@lieven.sterck,

You have the right idea with respect to distinct k-mers -- the memory usage of ABySS is linear w.r.t. the number of distinct k-mers in the input reads.

The number of distinct k-mers in the data set depends jointly on: (i) genome size, (ii) sequencing error rate (sequencing errors create unique k-mers), and (iii) read coverage.

You can determine the number of distinct k-mers in a data set by using a k-mer counter tool. I recommend "ntCard" from our own lab because it is quite fast.

ABySS does not currently have a feature to estimate memory requirements before running the assembly, unfortunately. It is mostly a trial-and-error affair at the moment.

If you find that you do not have adequate RAM to assemble your target genome, the ABySS Bloom filter assembly mode is worth a look (see the README).

ADD COMMENT • link 6.3 years ago by benv ▴ 730

0

Entering edit mode

@benv

I started my full run and with a 10-fold increase in input data I only observe a 2-fold increase of the mem-usage. So this seems to confirm our reasoning :-)

thx, L.

ADD REPLY • link 6.3 years ago by lieven.sterck 15k

0

Entering edit mode

perhaps even more informative, I went from 25 billion Kmers to 37 billion (kmer = 85) . which indeed roughly corresponds to the 2-fold mem increase.

ADD REPLY • link 6.3 years ago by lieven.sterck 15k