Normalisation before log2 transformation or after in Microarray Gene expression data?
1
0
Entering edit mode
6.0 years ago
J. Smith ▴ 80

Hi friends.

I have a doubt on the order of steps performed on Microarray Gene expression data / RNASeq data.

1) Whether we should apply normalisation techniques like quantile or lowess to Microarray gene expression and then perform log2 transformation or steps are correct other way round? I have found both types of order in different sources. Which one is correct?

2) And what about the order of steps in RNASeq data?

Thanks in advance.

RNA-Seq • 7.7k views
ADD COMMENT
8
Entering edit mode
6.0 years ago

For [Affymetrix] microarray, the broadly accepted method of normalisation is known as Robust Multiarray Average (RMA):

  1. background correction
  2. quantile normalisation
  3. probe summarisation (i.e. across transcripts)
  4. log (base 2) transformation

Extra notes:

  • An alternative to this which also adjusts for the GC content, and how it affects probe-binding affinities, is called GC-RMA.
  • Other types of normalisation (step 2) exist, namely: Qspline; LOESS; VSN (variance stabilising normalisation); et cetera
  • Step 3 is usually a 'median polish'
  • There are intricate differences in each step based on different microarray platforms

Log transformation is not performed prior to normalisation.

For more, read the really great review by Professor Quackenbush: Microarray data normalization and transformation.

--------------------------

Current RNA-seq normalisaton methods / expression measures differ quite a bit from each other. We have:

  • FPKM
  • RPKM
  • FPKM-UQ
  • RSEM
  • TPM
  • CPM
  • TMM
  • Median normalisation (DESeq2)

NB (added November 6th, 2019) - some of these are not considered normalisation procedures, per se, and are instead referred to as count measures / abundance measures / expression units that are produce from otherwise un-named normalisation procedures, e.g., FPKM

A log transformation is not typically involved in the normalisation process for RNA-seq. Statistical comparisons are performed on the normalised, unlogged counts, which generally do not follow a binomial distribution. RNA-seq count data, in fact, follows a negative binomial distribution, akin to a Poisson. However, one can later log the normalised counts, e.g. for plotting functions, in order to bring them to a binomial distribution. DESeq2, for example, implements a regularised log transformation.

For more, read A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

Kevin

ADD COMMENT
0
Entering edit mode

Dear Kevin, Many many thanks for your detailed response to my query.

ADD REPLY
0
Entering edit mode

No problem - best of luck. Please do also read the mentioned publications.

ADD REPLY
0
Entering edit mode

Thanks... I will surely go through those publications...

ADD REPLY

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6