Question

Hybrid assembly of PacBio and Illumina reads

1

Entering edit mode

7.7 years ago

int11ap1 ▴ 470

I have a ~30X PacBio dataset and ~40X Illumina dataset, besides of a mate-pair dataset. I am trying to assemble them (the expected genome size is around ~230Mb, it's a plant) in a server with 160Gb of RAM. However, I am having problems with ALLPATHS (lack of memory at the step of CorrectLongReads).

Is that possible to assemble my data in a cluster with 160Gb of RAM?
Is that possible with ALLPATHS?

pacbio illumina • 4.3k views

ADD COMMENT • link updated 7.6 years ago by Josué Barrera ▴ 10 • written 7.7 years ago by int11ap1 ▴ 470

score 2 · Answer 1 · 2016-08-26

2

Entering edit mode

7.7 years ago

GenoMax 141k

Have you seen this wiki page from PacBio?

As for the first question you have already answered that. There is no substitute for RAM. If the server does not have enough then finding alternate hardware may be the only option.

ADD COMMENT • link 7.7 years ago by GenoMax 141k

score 1 · Answer 2 · 2016-10-01

1

Entering edit mode

7.6 years ago

Josué Barrera ▴ 10

Given the ammount of Illumina and PacBio data you have, I would suggest a hybrid assembly using DBG2OLC. It gave me good results with 100x Illumina + 30x PacBio data, and it is memory efficient, so you could probably run it in your 160Gb RAM cluster. I would then use a scaffolding step using your mate-pair data to improve the assembly obtained from DBG2OLC and a final base correction using Pilon.