I've signed up to seek help in an assembly problem which I couldn't manage to resolve. Any help is very much appreciated.
I'm using Velvet to assemble 50 bacterial genomes (kmer 45-69). Average coverage is ~300x. However, usually at lower kmer values, I get zeros (i.e. failing assembly) or >1000 scaffolds. Then numbers go back to normal after kmer 50 or 53 (see example below).
Identifier kmer s > 200 bp s > 200 bp s > 200 bp s > 200 s > 200 s > 500 bp s > 500 s > 500 s > 500 s > 500 Mean Ins s.d. Ins Num N Num N Min N Max N Av N low cov min cov peak cov repeat cov
total len av len bp N50 bp largest bp total len bp av len bp N50 bp largest Size Size chars char runs chars chars chars
xxx 45 0 0 0 0 0 0 0 0 0 0 Given Given 0 0 0 0 0 N\A N\A 1 N\A
xxx 47 0 0 0 0 0 0 0 0 0 0 Given Given 0 0 0 0 0 N\A N\A 1 N\A
xxx 49 0 0 0 0 0 0 0 0 0 0 Given Given 0 0 0 0 0 N\A N\A 1 N\A
xxx 51 2560 2118246 827.43984375 1079 6472 1601 1799159 1123.77201749 1231 6472 Given Given 0 0 0 0 0 N\A N\A 173 N\A
xxx 53 189 2078834 10999.1216931 33558 97302 139 2063250 14843.5251799 33558 97302 Given Given 2027 55 10 67 36.9 N\A N\A 174 N\A
xxx 55 171 2079117 12158.5789474 35544 114031 124 2064418 16648.5322581 35544 114031 Given Given 2967 68 10 163 43.6 N\A N\A 139 N\A
Trouble-shooting:
I figured it might memory failure (though I'm using HPC). So I attempted to sample down to 200x, then 150x, and 100x. As I lower the coverage, failing kmers would go away (yay). However, best assembly (in terms of number of scaffolds) was at at 200x where a few failing kmers in a bunch of strain remain.
I even tried to increase the threshold of quality trimming, but that just resulted in worst assemblies.
What could be the reason for the failing kmers, and how can it be resolved? Can I just move on using the assemblies at 200x despite the few failures?
Your help is strongly appreciated.
Cheers
Hi Brian,
Thanks for your reply.
My genomes are illumina HiSeq, read lengths ~125bp, coverage range 200-390x, min-read-quality is 10, and average insert size is 300.
Source: isoalte
I only run it through standard QC (min-read-quality 10, min read length 70).
Will check out the normalization - thanks.
Thanks.
Areej