How can I fix frameshift errors in assembled genome?
1
1
Entering edit mode
4.1 years ago
star715 ▴ 40

I have a genome assembled from pacbio reads. However I have noticed many frameshift mutations in the genome due to which some genes are not annotated even if they are present in the genome. How would I be able to fix it?

Assembly genome polishing pacbio • 1.2k views
ADD COMMENT
0
Entering edit mode

You can use Quiver (if you have old RS reads) or Arrow (Sequel reads) to polish your assembly. You can also give Pilon a try. The newest version supports long reads.

ADD REPLY
0
Entering edit mode
4.1 years ago
predeus ★ 1.9k

What kind of a genome is it, and what kind of assembler have you used? Normally, PacBio-only assembly should not contain many errors, since assemblers polish the assembly before it's finalized.

At any rate, you can polish the assembly using Racon (https://github.com/isovic/racon), or a built-in polisher from flye assembler (https://github.com/fenderglass/Flye).

ADD COMMENT
0
Entering edit mode

It is pacbio RSII data and I have been using hgap3/hgap4 to assemble it. I tried with flye but it gave more frameshift errors. Can pilon/Racon work with only pacbio data and no short reads?

ADD REPLY
0
Entering edit mode

yes, both work fine with long reads only.

Why are you so sure these are in fact errors? Did you look at the alignments in a genome browser, e.g. IGV? Visual inspection can often be misleading - you need a pileup and some sort of summary (BAM + BAI index in IGV help with this nicely - you'll see variants in coverage right away).

ADD REPLY

Login before adding your answer.

Traffic: 2423 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6