Hello there!
I am doing small RNA seq analysis and I have been using featureCounts to assign reads. There was high % of unassigned ambiguity and when I looked into it further into these I can see that many of the reads that had been denoted as ambiguous could be annotated either as tRNA and piRNA or snoRNA and piRNA (using the -O flag in featureCounts to allow overlap).
I have looked at a few of the small RNA annotation tools that are available and they usually annotate reads in a certain order (e.g. miRNA - rRNA - tRNA- snoRNA - piRNA). Within the tools documentation, I haven't been able to find out why they annotate reads in a certain order, would anyone be able to explain this or suggest anything online I could use to understand it?
Any help will be greatly appreciated!
Katie
What do you exactly mean by that ? That in the case of ambiguous mapping, they assign the reads in priority to miRNA, then rRNA, etc... ?
If it is so, it is probably a matter of probability: in a sRNA-seq experiment, a read that can be assigned to both miRNA sequence and something else is more likely to come from the miRNA (because of size selection). Then, rRNA come second as they are super abundant, etc...
Hi Carlo,
Thank you for your reply. Perhaps I worded it wrong but yes it was the order of priority I'm unsure about. I'm not which of tRNA, snoRNA or piRNA would come first?
I think that you need to take a step back here. Your main issue is ambiguous mapping, because of the short read length and repetitive nature of the "sRNA-ome". Prioritization and iterative mapping is only one way to solve this issue. Actually, your question highlight one of the reason why the prioritization method is suboptimal, since the assumption you can make on your data are limited and not always transferable (is a sRNA read more likely to come from piRNA locus or snoRNA gene ? I have no idea.)
For starter, I suggest that you read this recent paper, that summarizes well the issue of multimapping in sRNA-seq and propose a different solution than prioritization (rescue). Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data