Question

Genovo Metagenome Assembly

0

Entering edit mode

5.3 years ago

MK2000 ▴ 30

Can you break up your fasta file containing all metagenomic reads into smaller, separate files and submit these smaller files separately to an assembly program? I am working with 454 data on Genovo assembler and it doesn't seem to want to take files with more than about 300 reads in them.

assembly Metagenome • 948 views

ADD COMMENT • link updated 5.3 years ago by h.mon 35k • written 5.3 years ago by MK2000 ▴ 30

1

Entering edit mode

It is highly unlikely Genovo can't deal with more than 300 reads in a file. From an old announcement:

We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads.

How are you running Genovo? What is the error message?

ADD REPLY • link 5.3 years ago by h.mon 35k

0

Entering edit mode

I downloaded and unpacked Genovo binaries 0.4, which I am running from my mac on a remote computer. I am using ./assemble file.fasta, but receiving the following error:

Gibbs Reads... Regular spikes. buildHash done in 0.02 seconds. 1000 2000 3000 assemble: jvAlign.cc:43: jvAlignCache::jvAlignCache(jvBase*, int): Assertion `j<100' failed. Aborted

However, it seems to work when I run it on a smaller subset of my fasta file. My file has ~800,000 reads.

ADD REPLY • link 5.3 years ago by MK2000 ▴ 30

score 1 · Answer 1 · 2019-01-12

Maybe your problem is you have sequences longer than 1000bp? From the FAQ:

Algorithm cannon handle reads with length>1000.

Anyway, you can try the approach of splitting the initial fasta into many files, then running smaller assemblies and continuing the next where the previous one stopped using the following command-line:

assemble <fasta_file> N <dump_file>

This will run Genovo for N iterations, loading the initial state from <dump_file>, where <dump_file> is the <fasta_file.dump.best> from the previous run. You can then run this until all reads have been incorporated into the assembly.