Question

kmergenie gives k value larger than read size

0

Entering edit mode

9.5 years ago

Illinu ▴ 110

Hello,

kmergenie is supposed to support different libraries in the same run. The manual states to include all paired-end libraries that will be used for the assembly. I calculated the best k value for 2 pe libraries, one has reads 90-99bp and the other one 290-299bp. The best k value is 103 which is not possible because it is larger than the smaller read.

Any ideas?

kmergenie libraries • 2.7k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Illinu ▴ 110

Ram · Answer 1 · 2014-10-24

0

Entering edit mode

9.5 years ago

Rayan Chikhi ★ 1.5k

Hi,

Yes, perhaps you have so much coverage on your 290bp library, that using it alone is sufficient to get a very good assembly (with high k), than setting a low k just for the sake of using the 90bp library. Could you try it?

Rayan

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Rayan Chikhi ★ 1.5k

0

Entering edit mode

Hi Rayan,

The funny thing is that when I run kmergenie only with the 'larger' pe library I get a best k of 81...

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Illinu ▴ 110

0

Entering edit mode

Ohh that is odd. Could you please send me both HTML reports?

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Rayan Chikhi ★ 1.5k

0

Entering edit mode

Hi Rayan, when I run kmergenie in the cluster the html report does not generate. I tried running it in my desktop but it takes forever. Any alternative?

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Illinu ▴ 110

0

Entering edit mode

It might be sufficient to copy-paste here the .dat file, and if possible, send me the .histo/.pdf files to kmergenie@cse.psu.edu, could you do that please?

To get reports, you can contact your cluster administrator, to ask him to install ghostscript. Kmergenie uses it to generate reports on machines where X is not running, i.e. clusters.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Rayan Chikhi ★ 1.5k

0

Entering edit mode

I sent you everything by email. Thanks

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Illinu ▴ 110

0

Entering edit mode

I've replied to Illinu by email, but let me copy my response here if anyone's interested. Also note that his organism is diploid.

Thanks much for the data, it's very interesting.

It seems that for the long reads alone, a k value of 180 would work as well as short+long reads. To see this, notice that the histogram (.pdf) of long reads at k=180 looks very similar to the short+long reads histogram at k=180. However in the former, Kmergenie failed to fit the diploid model to it, hence could not predict the number of genomic kmers.

Anyhow, I think that k=81 prediction for the long reads alone is probably not the best here.

It seems that the diploid fit in Kmergenie could be improved to handle this dataset, but I don't really know how right now.

Anyhow, a best k value longer than the smaller library read size is still very likely here.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Rayan Chikhi ★ 1.5k