I have read several reviews about the current implementation of metatranscriptomics but I don't understand how mRNA seq reads are mapped to species nor do I understand the certainty with which data is mapped, e.g., the metagenomics; it seems most of this is glossed over in pipeline discussions in these reviews.
Basically, I understand that total mRNA is extracted and sequenced just like in regular trxomics (or w/o cDNA synth but still basically the same). Once the reads have been deduced is where it gets fuzzy.
There exist two options:
Mapping to existing genomes: this makes sense; basically you have a list of things to try. Buuuut it seems that say, for an environmental sample, there are far too few genomes that exist for much certainty? I know that databases exist for some concerted environments (e.g., guts and mouths etc).
DeNovo Assembly: So you don't get a hit on an existing - now we are going to randomly assemble reads and test fit them with each other and see how they bin. I have never done any deNovo assembly myself, but it mostly makes sense for a single species. When we talking about a meta system how sure are we when we see say two genes from two very similar systems or that might be orthologs etc, it seems that certainty would drop off precipitously.
So how is certainty/confidence measured? What level of species confidence do we get? What do we do when reads don't fit to any known organisms? How computationally intensive are these processes? What do we do when a genome doesn't exist?
Anybody got any good references they could point me to/explain some of this to me? - sorry for the long question!