Question

Quantile normalisation: raw rpkm or log2rpkm?

1

Entering edit mode

7.5 years ago

BioinfGuru ★ 1.7k

Hi all,

Im really stuck with this one so any help would be appreciated.

I have the expression data for 28 different tissues. The matrix I create will look something like below but with 28 tissues and around 50k rows:

       tissue1    tissue2     tissue3 .....etc....
gene1
gene2
gene3
etc..

I'm going to use limma's "normalizeBetweenArrays()" function to quantile normalise the data. I cant figure out whether I should be filling the matrix with the raw rpkm values or the log2 normalised values for entry into the limma function. Which one should it be?

EDIT: I do get how quantile normalisation works, but I just dont know whether it is correct to use it on log2 values. I have read some resources on this hwoever no one is clear about what the input is.

Thanks,

Kenneth

rnaseq quantile normalisation • 5.7k views

ADD COMMENT • link updated 7.5 years ago by Michael 54k • written 7.5 years ago by BioinfGuru ★ 1.7k

0

Entering edit mode

Look at WIKI-link:

https://en.wikipedia.org/wiki/Quantile_normalization

Without theory see this recent paper as an example:

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/

Also see thist post and articles mentioned in the bottom:

Normalization Of Gene Expression Using Rnaseq Rpkm Values

ADD REPLY • link 7.5 years ago by natasha.sernova ★ 4.0k

2

Entering edit mode

OP didn't ask for an explanation of quantile normalisation... sharing links can be helpful, but these don't appear to be specifically about this question. If the answer to his question is somewhere on those pages, why don't you give the answer and refer to the pages for further explanation?

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

Ram · Answer 1 · 2016-10-09

2

Entering edit mode

7.5 years ago

Michael 54k

If you are analyzing RNA-seq data using limma you should use the voom transformation on the raw counts as described in the user guide, chapter 15. Using RPKM has no place in this analysis, whether log transformed or not.

ADD COMMENT • link 7.5 years ago by Michael 54k

0

Entering edit mode

I have quantile normalised the log2 data using another method ... and it returned the same results.

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 1.7k

0

Entering edit mode

I am sorry but this is hardly an intelligible statement.

ADD REPLY • link 7.5 years ago by Michael 54k

0

Entering edit mode

Apologies for the typo.

I have quantile normalised the log2 data using another R package:

>library("preprocessCore")
>x <- normalize.quantiles(my_data_matrix)

This method returns the same results as

>library ("limma")
>y <- normalizeBetweenArrays(my_data_matrix)

ADD REPLY • link updated 5.2 years ago by Ram 43k • written 7.5 years ago by BioinfGuru ★ 1.7k

score 0 · Answer 2 · 2016-10-09

I am not sure why normalizebetweenArrays is used for NGS data (as OP mentioned rpkm data and assuming that rpkms come from NGS data)

As for values, either one should work, as I understand from below line from manual.

Normalizes expression intensities so that the intensities or log-ratios have similar distributions across a set of arrays.

Intensities above mean raw values and log-ratios are in log scale (as per my understanding)