Question

Manual Assembly of genome portion

0

Entering edit mode

5.0 years ago

JulianC ▴ 30

Hi guys!

I have a whole genome paired ends sequencing data, divided into R1 and R2. I do not have the reference for the organism I am working on (only a close related one), and I am trying to manually assemble a part of the genome, the centromere, using the paired ends reads I have. I know the starting point and starting from a certain read, I want to elongate it in order to assemble that portion. Could you give me advices for this operation? I could take a part of the read and using grep command search it in the whole genome data, but I am not sure this is a way. Automatic assemblers such as Spades don't work because my genome is a large eukaryotic genome. Thank you!

Assembly • 788 views

ADD COMMENT • link updated 5.0 years ago by h.mon 35k • written 5.0 years ago by JulianC ▴ 30

0

Entering edit mode

Sorry to say, but manual assembly is probably not feasible. My advice is to use an assembler like velvet, MIRA or trinity. See how far you can get with these tools.

ADD REPLY • link 5.0 years ago by Benn 8.3k

score 1 · Answer 1 · 2019-04-30

If your are working on a large eukaryotic genome, most likely the centromere is composed repetitive sequences and spans from several thousand base pairs to some million base pairs. With current sequencing technology (even long reads like PacBio or Nanopore), it is a very, very hard task to assemble centromeres. It doesn't matter you have a starting point, because very quickly you will be picking up the repetitive sequences, which will be virtually impossible to correctly assemble.

You can see how far you can go with Tadpole, a genome assembler and part of the BBTools package, see the thread Extending ends of sequences with the help of reads? for details.