Question

microarray data processing

0

Entering edit mode

9.1 years ago

ewre ▴ 250

Hi, all

I have a question on microarray data processing. Here is what I have done:

we use ilumina humanHT12 microarray to profile gene expression changes on ~100 samples. after normalized with lumi r package(background adjusted, variance stablized and normalized with "ssn"), I randomly selected two sets of genes(about 100 genes for each set) from the data matrix(15301 genes ;141 samples),take the median value for each set of genes across all samples and then plot the value of one set against the other. to my surprise, I have found a correlation between the two randomly selected gene sets. Anyone could explain about this?

#dat is the expression matrix

##generate random index
set1.index=sample(1:nrow(dat),100)

set2.index=sample(1:nrow(dat),100)

set1.dat=dat[set1.index, ]

set2.dat=dat[set2.index, ]

##take the median value across samples
aggregate(set1.dat, by=list(set=rep(1,nrow(set1.dat))),FUN=median)->set1.aggr

aggregate(set2.dat, by=list(set=rep(1,nrow(set2.dat))),FUN=median)->set2.aggr

##reform the data for plot

rbind(set1.aggr[,-1],set2.aggr[,-1])->medi.dat

##plot it

plot(medi.dat[1,],medi.dat[2,])

with many thanks

microarray data processing • 2.2k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 9.1 years ago by ewre ▴ 250

Ram · Answer 1 · 2015-03-21

0

Entering edit mode

9.1 years ago

Devon Ryan 104k

You're comparing sample medians versus themselves, of course they show correlation (otherwise, statistics would break). Presumably you meant to get the median of genes and compare them:

set1.agr <- apply(set1.dat, 1, median)
set2.agr <- apply(set2.dat, 1, median)
plot(set1.agr, set2.agr)

ADD COMMENT • link 9.1 years ago by Devon Ryan 104k

0

Entering edit mode

thanks for your reply, Devon Ryan. I think you mean that those 100 genes randomly selected can represent the whole ~20000genes. that is reasonable. but it is not always the case. I have try the code in this post for other independent data sets, there are cases that it show no correlation at all.

Actually this is a question raised by a interesting hypothesis in my research: we observed that the oxidative phsorylation function was disturbed in our case samples, so we hypothesized that oxidative phosphorylation genes' expression profile must be different from the 'overall expression profile'(we use randomly selected gene to represent this overall expression profile, this is in accordance with your reply~_~), but to my surprise we find that there is always a high correlation between OXPHS genes expression and randomly selected genes' expression profile in my data. So I check this hypothesis in other independent data sets, it turns out that in some data sets the phenomenon holds while in others it didn't.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by ewre ▴ 250