Manual Assembly of genome portion
1
0
Entering edit mode
5.0 years ago
JulianC ▴ 30

Hi guys!

I have a whole genome paired ends sequencing data, divided into R1 and R2. I do not have the reference for the organism I am working on (only a close related one), and I am trying to manually assemble a part of the genome, the centromere, using the paired ends reads I have. I know the starting point and starting from a certain read, I want to elongate it in order to assemble that portion. Could you give me advices for this operation? I could take a part of the read and using grep command search it in the whole genome data, but I am not sure this is a way. Automatic assemblers such as Spades don't work because my genome is a large eukaryotic genome. Thank you!

Assembly • 788 views
ADD COMMENT
0
Entering edit mode

Sorry to say, but manual assembly is probably not feasible. My advice is to use an assembler like velvet, MIRA or trinity. See how far you can get with these tools.

ADD REPLY
1
Entering edit mode
5.0 years ago
h.mon 35k

If your are working on a large eukaryotic genome, most likely the centromere is composed repetitive sequences and spans from several thousand base pairs to some million base pairs. With current sequencing technology (even long reads like PacBio or Nanopore), it is a very, very hard task to assemble centromeres. It doesn't matter you have a starting point, because very quickly you will be picking up the repetitive sequences, which will be virtually impossible to correctly assemble.

You can see how far you can go with Tadpole, a genome assembler and part of the BBTools package, see the thread Extending ends of sequences with the help of reads? for details.

ADD COMMENT
0
Entering edit mode

Thank you very much for your advice h.mon, I will see that

ADD REPLY

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6