Question

Missing sequence from a cosmid de novo assembly

0

Entering edit mode

5.4 years ago

kspata ▴ 80

Hi All,

I performed de novo assembly for a cosmid sequenced on NextSeq PE 300 using SPADES. The pipeline i used is as follows:

1.Trim the sequence to remove low quality bases 2. Extract a subset of reads 3. Perform SPADES de novo assembly.

The expected length of cosmid was 50Kb while I got a sequence length of around 47.5kb. This cosmid contained an overlapping region with another cosmid and the overlapping sequence was PCR amplified and sequenced confirming its presence.

The length of the overlapping sequence is 990bp and it is not present in the assembled sequence.

I have looked through the contigs.fasta file obtained from the SPADES output and this sequence is not present in other contigs as well.

What approach should I use to search for this missing sequence in the raw data or the assembled data? How can I justify the absence of this sequence from the assembled genome?

Thanks!!

assembly spades de novo sequencing • 1.3k views

ADD COMMENT • link updated 5.4 years ago by harold.smith.tarheel ★ 4.9k • written 5.4 years ago by kspata ▴ 80

score 0 · Answer 1 · 2018-11-30

0

Entering edit mode

5.4 years ago

harold.smith.tarheel ★ 4.9k

Two easily testable possibilities:

1) Spades failed to assemble the reads for this segment. 2) Reads for this segment are not present in your sample/data.

You can discriminate by aligning your data to the sequence in question.

ADD COMMENT • link 5.4 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

Thank you for replying.

I performed further troubleshooting by searching for substrings of missing sequence in the contigs fasta file but did not find any match for substrings of length 50bp, 80bp, and 100bp.

What other assembly tools or strategies can I use to troubleshoot this?
Should I try merging the paired end reads and perform assembly using SPAdes on the merged data treating them as Single end reads?
Will sequencing using PacBio help? Can I use either canu/pilon or any hybrid assembly approach to get the complete de novo assembled sequence of the cosmid (50kb)?

Please guide me for the same.

ADD REPLY • link 5.3 years ago by kspata ▴ 80

0

Entering edit mode

Why would you search for the missing sequence in the assembled contigs, when you've already said that it's missing? I recommended aligning your data (i.e., your reads) to the missing sequence. Or, you can parse that data for substrings.

ADD REPLY • link 5.3 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

It is present in the sample cosmid DNA as confirmed by PCR sequencing. But i guess it was either not sequenced or SPAdes failed to assemble. 1. Illumina sequencing failure can be confirmed by mapping forward and reverse reads to the missing DNA sequence which resulted in 0% mapping rate.

ADD REPLY • link 5.3 years ago by kspata ▴ 80