This site is a beta test.
Question: Assembly for a single cDNA
0
Entering edit mode
4 months ago
karpet34 • 0

Dear all,

I have performed a RT-PCR which give me a 1500 pb product. I sequenced it with the Illumina technology (2X250 paired-end reads). Then, since several weeks, I unsuccessfully assemble the reads to get the full length sequence. I have tried many of classic assemblers (cap3, ssake, arapan, minimus2, ...) but all of them provide multiple contigs some of which exceeds more than 5kb!

I checked that all the reads are mapped well on the reference.

I am looking for an assembler able to do the job. Is there anyone have an idea?

Thank you!

ADD COMMENTlink 4 months ago karpet34 • 0 • updated 4 months ago h.mon 25k
Entering edit mode
0

What is the organism you're working on? You mention both align to reference and assemble. Do you wish to do both, and if so, why?

ADD REPLYlink 4 months ago
RamRS
21k
Entering edit mode
0

You probably have way more coverage than you need and that is likely causing the assembly problems. So consider downsampling the data. You can use bbnorm.sh from BBMap suite (guide here). You can also take a look at tadpole.sh (guide here) as an alternate k-mer based assembler.

ADD REPLYlink 4 months ago
genomax
68k
0
Entering edit mode
4 months ago
h.mon 25k
Brazil

Illumina sequencing is noisy, because due to the sheer volume of data generated, even low frequency errors and contaminants will get a good number of reads in the end. When assembling an amplicon, you will want to filter your contigs by coverage, as your "true" amplicon will have a much higher coverage than the errors and contaminants. Also, there are several pipelines specialized for targeted sequencing assembly, you may try one (or several) of them. Two I remember are ARC and HybPiper.

By the way, did you remove Illumina adapters before assembling?

ADD COMMENTlink 4 months ago h.mon 25k

Login before adding your answer.

Powered by the version 1.5.2