Question

Normalisation before log2 transformation or after in Microarray Gene expression data?

0

Entering edit mode

6.0 years ago

J. Smith ▴ 80

Hi friends.

I have a doubt on the order of steps performed on Microarray Gene expression data / RNASeq data.

1) Whether we should apply normalisation techniques like quantile or lowess to Microarray gene expression and then perform log2 transformation or steps are correct other way round? I have found both types of order in different sources. Which one is correct?

2) And what about the order of steps in RNASeq data?

Thanks in advance.

RNA-Seq • 7.7k views

ADD COMMENT • link 6.0 years ago by J. Smith ▴ 80

score 8 · Answer 1 · 2018-04-30

For [Affymetrix] microarray, the broadly accepted method of normalisation is known as Robust Multiarray Average (RMA):

background correction
quantile normalisation
probe summarisation (i.e. across transcripts)
log (base 2) transformation

Extra notes:

An alternative to this which also adjusts for the GC content, and how it affects probe-binding affinities, is called GC-RMA.
Other types of normalisation (step 2) exist, namely: Qspline; LOESS; VSN (variance stabilising normalisation); et cetera
Step 3 is usually a 'median polish'
There are intricate differences in each step based on different microarray platforms

Log transformation is not performed prior to normalisation.

For more, read the really great review by Professor Quackenbush: Microarray data normalization and transformation.

--------------------------

Current RNA-seq normalisaton methods / expression measures differ quite a bit from each other. We have:

FPKM
RPKM
FPKM-UQ
RSEM
TPM
CPM
TMM
Median normalisation (DESeq2)

NB (added November 6th, 2019) - some of these are not considered normalisation procedures, per se, and are instead referred to as count measures / abundance measures / expression units that are produce from otherwise un-named normalisation procedures, e.g., FPKM

A log transformation is not typically involved in the normalisation process for RNA-seq. Statistical comparisons are performed on the normalised, unlogged counts, which generally do not follow a binomial distribution. RNA-seq count data, in fact, follows a negative binomial distribution, akin to a Poisson. However, one can later log the normalised counts, e.g. for plotting functions, in order to bring them to a binomial distribution. DESeq2, for example, implements a regularised log transformation.

For more, read A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

Kevin