Biostar Beta. Not for public use.
Lower Assembly Quality Despite Higher Coverage?
0
Entering edit mode
17 months ago
GTR • 0

Hi everyone,

I wanted to see the effect of coverage on the assembly quality to see at which point there is a diminishing return. I am using paired-end reads only (101x2 with 520 insert size), no mate pairs or long reads. Normally a higher coverage is supposed to increase NGA50, but instead, the contig NGA50 has gone down while the LGA50 has gone up. I have five levels of coverage: 10X, 15X, 20X, 25X and 30X.

15X has the highest contig NGA50 and lowest LGA50, while 30X has the lowest NGA50 and highest LGA50. The order of high NGA50 to low NGA50 is in this order: (Best)15X, 10X, 20X, 25X, 30X(Worst).

The k-mer size used for assembly was 25 bp and was run with SOAPDenovo2 and low-frequency k-mers were not discarded.

I used QUAST to evaluate the assembly.

What explanation(s) could there be for these results?

Thank you.

ADD COMMENTlink
0
Entering edit mode
13 months ago
h.mon 25k
Brazil

low-frequency k-mers were not discarded.

Low frequency kmers most of the time are errors. If you increase coverage, you increase the amount of errors, and this may be the cause of the worst NGA50 with increased coverage. Did you perform error correction before assembly? You may try error correcting the reads, or removing bad kmers.

Even so, there is no simple answer to your question. Taking only one two measures (NGA50 and LGA50) as overall measure of assembly quality is not advisable. Your coverage range is very narrow, many assemblers need 50x-100x coverage. Also, the interplay between assembly quality and coverage also depends on assembler, genome complexity, and other factors.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1