is 1.4 Gb too large a genome for SPAdes?
2
0
Entering edit mode
5.0 years ago
AGE ▴ 30

I want to assemble a reptile genome with the software spades since it has given me no problems installing, unlike many other programs (e.g. Masurca, velvet, SOAPdenovo, etc). I'm wondering if a diploid genome of size 1.4 Gb is too large for this program.

assembly genome • 2.8k views
ADD COMMENT
0
Entering edit mode

The program is unlikely to have any issues specifically. What hardware do you have available?

ADD REPLY
0
Entering edit mode

I have access to a cluster. The program estimated that I need approx. 500G of ram last time it ran out of memory.

ADD REPLY
0
Entering edit mode

How many reads do you have/estimated coverage? You may be able to downsample your reads. Surpassing 500gb ram doesn’t sound right to me though. Something else may be going on.

ADD REPLY
3
Entering edit mode
5.0 years ago
Buffo ★ 2.4k

I have read papers that assembled genomes of almost 3 Gb with spades (if I remember the reference I will post it), so the length would no represent a problem. However, personally I had problems with complex genomes (with large, short and tandem repeats), it causes very fragmented and redundant assemblies. In addition, to assemble 50 Mb of diploid genome from 80 million of paired reads I needed about 40-50 gb of RAM using default parameters, increasing the kmer size it was impossible.

ADD COMMENT
0
Entering edit mode

Thanks for the info! Yes, it does use quite a bit of RAM. I tried running the program a few weeks ago and it estimated that I needed 500Gb of RAM.

ADD REPLY
0
Entering edit mode
5.0 years ago
h.mon 35k

From the SPAdes manual:

Note, that SPAdes was initially designed for small genomes. It was tested on bacterial (both single-cell MDA and standard isolates), fungal and other small genomes. SPAdes is not intended for larger genomes (e.g. mammalian size genomes). For such purposes you can use it at your own risk.

As Buffo noted, it is possible to use SPades with large genomes, and I have used it myself. But it was hit or miss, very often it would fail due to using to much memory or SPAdes would spit some error. Again as Buffo noted, complex genomes, or data with lower quality, can hugely increase memory usage, rendering SPAdes impractical.

Regarding installation problems, (mini)conda may be of great help.

ADD COMMENT

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6