1 - read count

Question

rRNA quantification: which is the best way?

0

Entering edit mode

7.8 years ago

A. Domingues ★ 2.7k

My goal was to quantify how many ribosomal reads (5.8S) there were in my library, and use these to normalize gene expression. It makes sense in my experiment. I took 2 approaches:

1 - read count

Affter mapping with Tophat, no multi-mappers allowed, I used featureCounts to count reads mapping to features, and tallied up the rRNA reads as those mapping to "LSU-rRNA_Hsa".

Using this method, there are between 10^5 to 10^6 rRNA reads for each sample.

2 - mapping to rRNA

Following these instructions, I mapped reads directly to the rRNA sequences provided in the iGenomes bundle (AbundantSequences/hum5SrDNA) using bowtie.

Using this method, there are between 10^3 to 10^4 mapped rRNA reads for each sample. This is quite a difference!

Questions:

Why such a difference between methods? I actually expected to get more reads when mapping directly to sequence since the multi-mapping issue is avoided.
Is any of these methods adequate for the purpose?
Which alternative method would you suggest?

I might actually not use this data, for a number of reason, but I am curios about what happened here.

rRNA mapping featureCounts • 3.7k views

ADD COMMENT • link updated 7.8 years ago by Charles Plessy ★ 2.9k • written 7.8 years ago by A. Domingues ★ 2.7k

1

Entering edit mode

I would say that the simple and straightforward answer is that the two reference sequences that you are aligning and counting against are not similar to one another.

ADD REPLY • link 7.8 years ago by Istvan Albert 100k

score 1 · Answer 1 · 2016-07-12

I usually filter out rRNA reads using the tools TagDust 2 and its -ref option, to which I give a FASTA file containing either the rRNA sequences, or the whole rDNA locus (like U13369 for humans). To my knowledge, the human genome assembly _hg38_ does not contain the rDNA loci, which are challenging to assemble.

LSU-rRNA_Hsa is the name of a rRNA-derived repeated element. While rRNA reads will tend to align there if they are not filtered out first, I think that counting alignments to these regions is not a good way to quantify rRNA reads.

Conversely, the name _hum5SrDNA_ suggests that it does not contain all the rRNA sequences, but only the one of the 5S rRNA...