Biostar Beta. Not for public use.
Consensus length longer than the longest assembled contig
0
Entering edit mode
13 months ago
DanielC • 80
Canada

Dear Friends,

I assembled a phage sequenced data (DNA). The assembly was done using SPAdes. The longest contig was of 88315 size. To get the consensus from the assembled contigs (total 46 of size about 88000), I used gap4. When I saved the consensus, I see that the consensus size is 1260000! Could you please give your comments on how the consensus size can be this long and longer than the longest contig? Is it a good approach to take this consensus for gene prediction and annotation? OR should I take the longest contig for annotation and prediction?

Thanks, DK

ADD COMMENTlink
0
Entering edit mode
14 months ago
h.mon 25k
Brazil

For starters, use the SPAdes assembly for gene prediction and annotation, but you may want to filter out short contigs and very low / very high coverage contigs.

While I really don't understand your problem (more on this later), using Gap4 to assemble the output of SPAdes makes no sense. Both SPAdes and Gap4 are genome assemblers and output a "consensus" fasta representation of the assembly.

SPAdes had at its disposal all the reads to assemble the genome, and could use their information to break contigs at uncertain regions - for example, repeat regions.

Gap4 is also a genome assembler (developed to assemble Sanger sequencing reads), but if you try to assemble the SPAdes assembly, you will get either no improvements, or even worst, misassemblies, as Gap4 may join these repeat regions broken by SPAdes.

What I don't understand from your description: how can the longest contig (88315bp) be longer than total assembly (46 contigs of about 88000bp)?

ADD COMMENTlink
0
Entering edit mode

Thanks for the info h.mon! I meant that the longest contig I got from spades is 88315; there are other contigs too of about 88000 size each(46 other than the longest one which is 88315) assembled by spades. What I am trying to do is get a consensus by aligning these contigs using gap4, but as you said, I think gap4 is joining the repeat regions and make the consensus way longer, in my case 1260000. So, the question is can you please let me know how can I generate consensus from these contigs by Spades by aligning them; any software or program? I would really appreciate. Please let me know if anything is not clear.

Thanks, DK

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1