within array vs between array normalization (methylation)
1
1
Entering edit mode
5.3 years ago
rorio ▴ 20

Hello all,

This may seem to be a silly question, but what is the difference between "within array" and "between array" normalization, specifically within the context of Illumina methylation microarrays? I understand the concept of normalization, but not really this distinction. It seems to be used inconsistently. What, exactly, is an "array"? Does it refer to a single sample on a chip? If so, I understand that "within array" normalization could be used to correct for the Inf I/Inf II shift, and between array normalization would then be to ensure each sample has the same distribution of intensities.

However, is an array referring to a chip with 8 / 16 samples on it? If so, why is there a distinction between the type of normalization within an array (ie, normalizing the 8 /16 samples to each other) and normalizing this array to other arrays?

Thank you for your help!

normalization methylation microarrays illumina • 3.5k views
ADD COMMENT
1
Entering edit mode
5.2 years ago

My answer will be brief:

In most contexts, 'chip', 'array', and 'microarray' refer to the same thing, and only a single sample will be used per chip / array / microarray.

Within-array normalisation will take each array (thus, each sample) as its own entity and perform some normalisation procedure separately on each - during this normalisation procedure, information from other arrays / samples in your dataset is not taken into account. The within-array normalisation method can differ across different platforms. For example, for 2-colour microarrays, your sample's DNA/cDNA is hybridised with a control DNA/cDNA, i.e. on the same chip, in which case, within-array normalisation occurs per sample and using information from the probes that have hybridised to the control sample. To a certain degree, within-array normalisation occurs on every chip due to the presence of replicate probes (multiple probes with same target) and / or positive and negative controls.

Between-array normalisation, then, looks at information from all of your samples and provides a metric to normalise all together. This is 'necessary' for the purposes of differential expression analysis. One of the most widely-used between-array normalisation methods is quantile normalisation, which is used by the widely known RMA (robust multiarray average) method. In a nutshell, quantile normalisation aims to match the data distribution across all of your samples. This is why, after quantile normalisation, the distributions of your samples usually look quite similar in a box-and-whisker plot.

By just performing within-array normalisation, there is a risk that your samples are not cross-comparable afterward, i.e., for the purposes of differential expression analysis. Biases / technical artifacts may exist as a result of, for example, how different samples were prepared and then loaded on the chips. We can mirror this to RPKM and FPKM expression units in RNA-seq: RPKM/FPKM are produced by a normalisation strategy that is within-sample only. This is why usage of RPKM/FPKM is not recommended in some circles for differential expression analysis.

Trust this helps.

Kevin

ADD COMMENT
1
Entering edit mode

Hi Kevin,

Thank you so much for the answer! Actually, that clears it up completely. Thanks again, this is really helpful!

All the best,

Rory

ADD REPLY

Login before adding your answer.

Traffic: 3208 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6