rRNA quantification: which is the best way?
1
0
Entering edit mode
7.8 years ago
A. Domingues ★ 2.7k

My goal was to quantify how many ribosomal reads (5.8S) there were in my library, and use these to normalize gene expression. It makes sense in my experiment. I took 2 approaches:

1 - read count

Affter mapping with Tophat, no multi-mappers allowed, I used featureCounts to count reads mapping to features, and tallied up the rRNA reads as those mapping to "LSU-rRNA_Hsa".

Using this method, there are between 10^5 to 10^6 rRNA reads for each sample.

2 - mapping to rRNA

Following these instructions, I mapped reads directly to the rRNA sequences provided in the iGenomes bundle (AbundantSequences/hum5SrDNA) using bowtie.

Using this method, there are between 10^3 to 10^4 mapped rRNA reads for each sample. This is quite a difference!


Questions:

  1. Why such a difference between methods? I actually expected to get more reads when mapping directly to sequence since the multi-mapping issue is avoided.

  2. Is any of these methods adequate for the purpose?

  3. Which alternative method would you suggest?

I might actually not use this data, for a number of reason, but I am curios about what happened here.

rRNA mapping featureCounts • 3.7k views
ADD COMMENT
1
Entering edit mode

I would say that the simple and straightforward answer is that the two reference sequences that you are aligning and counting against are not similar to one another.

ADD REPLY
1
Entering edit mode
7.8 years ago
Charles Plessy ★ 2.9k

I usually filter out rRNA reads using the tools TagDust 2 and its -ref option, to which I give a FASTA file containing either the rRNA sequences, or the whole rDNA locus (like U13369 for humans). To my knowledge, the human genome assembly _hg38_ does not contain the rDNA loci, which are challenging to assemble.

LSU-rRNA_Hsa is the name of a rRNA-derived repeated element. While rRNA reads will tend to align there if they are not filtered out first, I think that counting alignments to these regions is not a good way to quantify rRNA reads.

Conversely, the name _hum5SrDNA_ suggests that it does not contain all the rRNA sequences, but only the one of the 5S rRNA...

ADD COMMENT

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6