Question

how to normalize two microarray dataset coming from different platform

1

Entering edit mode

6.3 years ago

salvatore.raieli2 ▴ 90

Hi everyone,

I want to compare different microarray datasets from different platforms, which is the best method and package to use?

thank you very much for your help

R • 5.0k views

ADD COMMENT • link updated 6.3 years ago by Kevin Blighe 87k • written 6.3 years ago by salvatore.raieli2 ▴ 90

1

Entering edit mode

So you want to compare condition 1 on platform A with condition 2 on platform B? I'd say you can't.

ADD REPLY • link 6.3 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes, the exact arrangement of the samples/conditions, and the array types is important

ADD REPLY • link 6.3 years ago by Kevin Blighe 87k

0

Entering edit mode

This question has been already discussed. Did you check ?

ADD REPLY • link 6.3 years ago by arta ▴ 670

0

Entering edit mode

I saw some questions on the same topic, many are from 3-4 years ago, as someone suggested I checked insilicoDB and virtualarray packages and I saw that they are discontinued from the latest bioconductor version (which I am using).Someone suggested sva (I read the vignette and it is something that is not the best for my work) so I asked if asked again to know what are the best method and packages available now.

ADD REPLY • link 6.3 years ago by salvatore.raieli2 ▴ 90

1

Entering edit mode

This paper is quite new, maybe you can read it which includes review of cross-platform normalization. It could be useful.

ADD REPLY • link 6.3 years ago by arta ▴ 670

0

Entering edit mode

thank you, I read this article and it is quite interesting, some of the packages used are not anymore available.

ADD REPLY • link 6.3 years ago by salvatore.raieli2 ▴ 90

score 3 · Answer 1 · 2017-12-19

3

Entering edit mode

6.3 years ago

Kevin Blighe 87k

NB - Major update to answer: December 16, 2019

----------------------

It would help to state the specific arrays that you have.

There are different ideas on how best to do this - a search of the World Wide Web reveals this. I would not look for a package but instead begin to think critically about how it could work and what needs to be done.

1, 'Z-score' merge

Firstly, if you are interested in processing each dataset independently, then take my approach here: A: How to integrate multiple data sets from microarray platform prior meta-analysis

In this way, each dataset is processed and normalised independently. Then, they are respectively filtered so that genes are matched across all datasets, followed by a transformation to Z-scores (independently for each dataset). Once they are on the 'standard score distribution' (Z-scale), you can conduct statistical analyses on this merged dataset, and it may be recommended to include ArrayVersion as a covariate in your models.

This is best for downstream analyses based on correlation analyses, like network analysis.

2, 'direct' merge

Attempt a 'direct' merge, as per the approach here: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE6nnn/GSE6344/suppl/GSE6344_MethodsDetails.PDF

Here, datasets are again normalised independently and then genes are filtered so that they match. Then, a scaling factor is applied to one array so that the arrays can easily be merged together. There is further info here: Regarding Microarray Platforms

--------------------------------------

Further posts on this topic that I have made:

ADD COMMENT • link 4.4 years ago by Kevin Blighe 87k

0

Entering edit mode

thank you very much for your reply, I want to normalize indipendently each datasets and after I was trying to find a system to normalize between the datasets. I was thinking about z-scores or one of the methods I read on the internet, but I am a bit afraid that a not optimal normalization can impact on the signal biological difference and be removed as batch effect

ADD REPLY • link 6.3 years ago by salvatore.raieli2 ▴ 90

1

Entering edit mode

I think that, provided you include the array type as a covariate in modelling / statistical tests, then it should not be a major issue. The regression models should be capable of adjusting the differences due to the fact that the differences will be consistent across all transcripts.

Also, if you're doing correlation-based analysis like WGCNA and other network analyses, then you do not actually need to worry about adjustments across array. Getting the data to the Z-score stage would be beneficial though (i.e., same distribution).

ADD REPLY • link 6.3 years ago by Kevin Blighe 87k

0

Entering edit mode

thank u for your help. it was my question too. but I don't know why you recommended to calculate z-score?

ADD REPLY • link 4.6 years ago by samane. • 0

0

Entering edit mode

Z-scores represent 'standardised scores', and the Z distribution is often referred to as the 'standard normal distribution'. A Z-score of 1 is equivalent to 1 standard deviation above the mean; whereas, e.g., -2.5, is 2.5 standard deviations below the mean. Z = 1.95 is equivalent to p = 0.05 (roughly).

So, bringing both datasets to a standard distribution allows for a more 'harmonious' merge; however, batch effects may still lurk, so, batch ought to still be included as a covariate in the design formula.

ADD REPLY • link 4.6 years ago by Kevin Blighe 87k