Consensus length longer than the longest assembled contig
1
0
Entering edit mode
5.1 years ago
DanielC ▴ 170

Dear Friends,

I assembled a phage sequenced data (DNA). The assembly was done using SPAdes. The longest contig was of 88315 size. To get the consensus from the assembled contigs (total 46 of size about 88000), I used gap4. When I saved the consensus, I see that the consensus size is 1260000! Could you please give your comments on how the consensus size can be this long and longer than the longest contig? Is it a good approach to take this consensus for gene prediction and annotation? OR should I take the longest contig for annotation and prediction?

Thanks, DK

DNA consensus contigs assembly gene annotation • 1.3k views
ADD COMMENT
0
Entering edit mode
5.1 years ago
h.mon 35k

For starters, use the SPAdes assembly for gene prediction and annotation, but you may want to filter out short contigs and very low / very high coverage contigs.

While I really don't understand your problem (more on this later), using Gap4 to assemble the output of SPAdes makes no sense. Both SPAdes and Gap4 are genome assemblers and output a "consensus" fasta representation of the assembly.

SPAdes had at its disposal all the reads to assemble the genome, and could use their information to break contigs at uncertain regions - for example, repeat regions.

Gap4 is also a genome assembler (developed to assemble Sanger sequencing reads), but if you try to assemble the SPAdes assembly, you will get either no improvements, or even worst, misassemblies, as Gap4 may join these repeat regions broken by SPAdes.

What I don't understand from your description: how can the longest contig (88315bp) be longer than total assembly (46 contigs of about 88000bp)?

ADD COMMENT
0
Entering edit mode

Thanks for the info h.mon! I meant that the longest contig I got from spades is 88315; there are other contigs too of about 88000 size each(46 other than the longest one which is 88315) assembled by spades. What I am trying to do is get a consensus by aligning these contigs using gap4, but as you said, I think gap4 is joining the repeat regions and make the consensus way longer, in my case 1260000. So, the question is can you please let me know how can I generate consensus from these contigs by Spades by aligning them; any software or program? I would really appreciate. Please let me know if anything is not clear.

Thanks, DK

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6