I have been running a few test runs with the abyss assembler on a subset of my input data (to speed up things) in order to optimise the Kmer. I'm now wondering if I can guesstimate the mem requirements of my full run on the mem usage of these trial runs .
More specifically does the mem scale with the input file size or rather with the genome size? I was thinking that eg. doubling my input size will not result in double mem used as the number of distinct Kmer to keep in mem is most likely plateauing?
any help or other user experiences is much appreciated.
thx
@benv
I started my full run and with a 10-fold increase in input data I only observe a 2-fold increase of the mem-usage. So this seems to confirm our reasoning :-)
thx, L.
perhaps even more informative, I went from 25 billion Kmers to 37 billion (kmer = 85) . which indeed roughly corresponds to the 2-fold mem increase.