Question: How much data and how many libraries do we need for de-novo assembly?
Could you please explain how many libraries (with different insert-sizes) and how much data per library we need for having a good coverage for a denovo genome? How could we define it?

For example:

1- What is optimal number of above issue for Bacteria (4-7Mb size)?

2- What is optimal number of above issues for a plant ( 600 Mb , diploid)?


  1. For bacteria I recommend one PacBio SMRT cell (should be sufficient for roughly >30x) or >30x Nanopore in combination with >20x standard Illumina PE. This often lead to very good assemblies out of the box using the right assemblers or long read assembly/polishing combinations. However, 50-100x Illumina coverage will do the trick, too. For bacteria you'll get some decent draft assemblies.

  2. Whole other story, though this is a tiny plant genome and not my field of expertise. You'll need a well thought sequencing and assembly strategy, ideally combining long reads, jump and paired end libraries - of course ideally adapted to the assembler you plan to use. Or you give it a shot with 100x PE, some jump and long jump libraies (like 3 and 8-10 kb) and see how far you get.

