Question

Extracted mapped shotgun metagenomic reads to reference genome. SPAdes or metaSPAdes for de-novo assembly?

0

Entering edit mode

5.5 years ago

O.rka ▴ 710

I have a reference strain and mapped all of my shotgun metagenomic reads to the reference strain using BBMap.

I extracted the mapped reads and want to create an assembly from this.

I usually use metaSPAdes for this but would SPAdes be better suited for this task? My PI prefers using SPAdes and metaSPAdes but I'm wondering which one would be better for this task in particular since it's only a (semi-)supervised subset of a metagenome.

[Bonus] if there is another assembler that is better suited for this exact task please let me know.

assembly metagenomics de-novo • 1.8k views

ADD COMMENT • link updated 5.5 years ago by h.mon 35k • written 5.5 years ago by O.rka ▴ 710

score 2 · Accepted Answer · 2018-10-15

Answering directly your question, I think you should use SPAdes, then also probably filter contigs diverging too much from the coverage of the longest contigs (this assumes the longest contigs belong to the strain of interest, which they should, as the reads used for assembly have been enriched for this strain, and remember contigs with rRNA reads in general have abnormally high coverage). But if the strain of interest is rare on your metagenomic sample, your resulting assembly will be very fragmented due to low coverage.

More philosophically:

Looking at some of your recent threads, it seems you have been struggling with the same issue for some days now. From older to more recent:

How to extract reads that match k-mer profiles from a collection of sequences?

How to interpret sam file generated from BBMap?

This thread

Can you assemble with merged paired end reads and unmatched reads as "single ended" reads?

So it seems you have shotgun metagenomics sequencing, but are interested in only one particular strain. It would be helpful you describe the problem in more detail, and you motivation to take this approach. This would help us evaluate if your approach is sound, or if a completely different approach is better.

The approach you have chosen seems to be mapping to a reference strain (a published genome?), and then assembling the genome using just the mapped reads. I wonder if just mapping to the reference strain and examining differences (calling SNPs / indels and structural variants) would be god enough for your purposes? Or assembling the whole metagenome, and then recovering the contigs belonging to the strain of interest?