Biostar Beta. Not for public use.
Oxford Nanopore and Illumina hybrid assembly
4
Entering edit mode
19 months ago
igor 7.7k
United States

Are there any de novo genome assemblers that work with both Nanopore and Illumina reads?

SPAdes can take both Nanopore and Illumina reads, but it's only for prokaryotic genomes. I haven't seen anything for eukaryotic.

All the discussion and literature that I have seen so far suggests using Nanopore long reads for assembly and then polishing with Illumina short reads. However, you need a certain level of coverage for the assembly to complete (for example, Canu recommended minimum is 20X). What if you only have 1X coverage with long reads? That will not be enough to assemble on its own, but should be much better than short reads alone. What's the appropriate approach for that situation?

ADD COMMENTlink
1
Entering edit mode

How large is your genome expected to be?

You could give SPAdes a try. As long as you are not in the "human" genome territory it may work. I recall one of the SPAdes developers writing that it could be used for larger (e.g. fungal genomes) but can't find that post/thread at the moment.

Edit: SPAdes manual refers to not using --careful option for "large or medium" eukaryotic genomes. So looks like you could certainly try it out.

ADD REPLYlink
0
Entering edit mode

It should be around 500 MB, so it's not too big, but certainly closer to human than bacterial size.

Good point about the --careful option, but the manual also says "SPAdes is not intended for larger genomes (e.g. mammalian size genomes)", so I am not sure which part to believe.

ADD REPLYlink
0
Entering edit mode

If you have some time (and I think you have the resources, if I recall from a 10x thread) go ahead and give it a try. At the most the job will fail :)

ADD REPLYlink
0
Entering edit mode

Good memory!

I certainly plan to give it a try. I just wanted to know if I am missing anything and to have some alternatives in case it fails.

ADD REPLYlink
0
Entering edit mode

Trinity, I think is the best option for nanopore reads in hydrid assembly.

ADD REPLYlink
0
Entering edit mode

Do you have a source for that? Because on github I find the following:

Trinity assembles transcript sequences from Illumina RNA-Seq data.

ADD REPLYlink
0
Entering edit mode

I should've specified it's genome assembly, not transcriptome. Trinity is for RNA-seq.

ADD REPLYlink
0
Entering edit mode

Oh, sorry that´s true, is for RNA-seq, what about IDBA_hybrid? You can use nanopore-reads as reference.

ADD REPLYlink
0
Entering edit mode

Hi there, I'm new to the subject but I will soon be facing the same interrogations. I found only SPAdes and ALLPATHS-LG for the moment that does that.

With a better coverage, what would be the best approach ? Using a pipeline to assemble de novo with Nanopore and Illumina data or assembling the genome with Nanopore data and then correct with Illumina data ? or even complete the draft genome from Illumina with Nanopore data ?

Thank you very much,

ADD REPLYlink
0
Entering edit mode

Nanopore still lacking performance, the ratio cost/performance remains high. I think that PacBio is the best option for long reads and to complete fragmented assemblies (from illumina). Where you from lagartija? I know your name :).

ADD REPLYlink
0
Entering edit mode

Actually I already have the reads by Nanopore so I can't change that. By the way, do you know what's the difference between Spades and Spades-Hybrid ? It seems that both can do hybrid assembly...

So you know my name ? You meed lagartija or my real name ? haha I'm from France. But I'm also Argentinian and Norwegian. And you ? Italian ?

ADD REPLYlink
0
Entering edit mode

No, is not the same, you can use 'trusted contigs' for de novo assemblies with spades, but not reads. On the other hand, spades hybrid can perform de novo assemblies from long and short reads :). I from the Congo but I live in America years ago, I know lagartijas XD.

ADD REPLYlink
0
Entering edit mode

AAAh I see. And how do I get the trusted contigs ? And both for Illumina and Nanopore ?

ADD REPLYlink
1
Entering edit mode

You can use old assemblies as trusted contigs (from the same specie and closely related), the use of not highly related genomes are not recomended (in spades), if you dont have access to old assemblies (or it does not exist) de novo and hybrid assemblie is the unique option, and yes, You can use reads from nanopore and illumina for hybrid assemblies with spades-hybrid.

ADD REPLYlink
0
Entering edit mode

Only if you have them from some other source (e.g. an illumina only assembly).

ADD REPLYlink
0
Entering edit mode

Because from what I see here Spades takes reads : http://spades.bioinf.spbau.ru/release3.10.1/manual.html

ADD REPLYlink
5
Entering edit mode
19 months ago
jblommaert92 • 70

Just thought I'd add the few options I've seen:

1) OPERA-LG https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

2) PacBio reccomendations may be relevant here https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads

3) LINKS https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0076-3

4) This workflow http://biorxiv.org/content/early/2016/05/22/054783

And this other question may be useful too Gap-filling and scaffolding using PacBio reads

DISCLAIMER: I haven't tried any of these yet, but I'm also planning nanopore-illumina hybrid assembly soon

ADD COMMENTlink
1
Entering edit mode

Those are excellent suggestions!

I should have been looking for "scaffolding" rather than "hybrid assembly", which is probably more appropriate in my case.

ADD REPLYlink
2
Entering edit mode
10 months ago
Carambakaracho ♦ 1.2k
Switzerland/Basel

Besides the almost obvious SPAdes I recommend looking into the MaSuRCA assembler. I had very good results for PacBio/Illumina and Nanopore/Illumina data, though my long read coverage was in all cases a little bit higher than what you describe.

BTW, SPAdes easily handles metagenome assemblies with way beyond 1 Gbps and with the latest version the error messages on memory consumption where improved and you'll find out pretty early whether it works or not. Give it a try, once it assembled, I'd even try the --careful option. It is rather depended on the available memory on your machine (should probably be 128GB or more) and the k-mer complexity of your genome than its taxonomic domain.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1