Question

Improve illumina short read assembly using PacBio long reads

2

Entering edit mode

4.9 years ago

shachigahoimbi ▴ 20

I am trying to assemble goat genome (genome size=2.9 Gb) and I have goat genome sequencing data from short and long reads

short read data from Illumina (genome coverage ~37x).
long read data from PacBio (genome coverage ~1.5x)

I have assembled Illumina short reads using ABySS and SOAPdenovo and got best N50 1884 at K-mer of 41. I would like to improve short read assembly using PacBio long reads data. Because of the low coverage (1.5x genome coverage) of PacBio data, I am unable to decide which software would be best for the improvement of N50 using long reads.

I tried HybridSPADES for hybrid assembly of my short and long read data but it is giving issue regarding memory (out of memory).

Please let me know, how could I improve short read assembly using low coverage (~1,5 X coverage) long reads.

Assembly Illumina genome PacBio • 1.8k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 4.9 years ago by shachigahoimbi ▴ 20

0

Entering edit mode

What was your input read length of the illumina data?

an optimal Kmer of 41 seems pretty low , what range did you evaluate?

ADD REPLY • link 4.9 years ago by lieven.sterck 15k

score 1 · Answer 1 · 2019-05-20

1

Entering edit mode

4.9 years ago

colindaven 6.3k

Maybe you can't, 1.5X actually means 0X for a good proportion of the genome.

Generally, you want 20X + Pacbio coverage to make a good assembly.

It might pay to use another better assembly - I think a goat is available - for orientating your short scaffolds.

ADD COMMENT • link 4.9 years ago by colindaven 6.3k

score 1 · Answer 2 · 2019-05-20

You could use your best Illumina assembly as input for whole-genome alignment with Cactus (https://github.com/ComparativeGenomicsToolkit/cactus) to NCBI accession GCA_004361675.1 as the reference.You could then use Ragout (https://github.com/fenderglass/Ragout) to generate a reference-guided assembly of your individual based off of the best available goat genome assembly GCA_004361675.1.

score 1 · Answer 3 · 2019-05-20

1

Entering edit mode

4.9 years ago

lieven.sterck 15k

Since you're already on the ABySS route, you could give LINKS a try: that's a long read scaffolder from the same people/group as ABySS.

but as mentioned by others here as well, 1,5x coverage will likely not get you very far

ADD COMMENT • link 4.9 years ago by lieven.sterck 15k

score 0 · Answer 4 · 2019-05-23

0

Entering edit mode

4.9 years ago

Vitis ★ 2.5k

filtlong may help you filter and correct long reads using your short reads.

https://github.com/rrwick/Filtlong

Then the corrected long reads may help you scaffold some contigs. But I agree with the other answers: 1.5X of long reads wouldn't get you very far.

ADD COMMENT • link 4.9 years ago by Vitis ★ 2.5k