How to generate a microarray data set with paired samples?
1
0
Entering edit mode
8.6 years ago
Pas ▴ 30

Hi all,

I need to simulate a microarray data set with paired samples. In particular I'd like to generate a matrix with 1000 genes(rows) and 20 samples ( columns) from 10 patients. I'd like to have the first 100 genes as differentially expressed consistently in all the patients. For "consistently" I mean that these genes must be up regulated or down regulated with the same fold change in all the patients. In other words, if the matrix has 20 samples from 10 patients, assuming that the first 10 columns represent the "normal samples" and the second 10 columns represent the "tumor samples" from the same patients, I want the first 50 genes must be up regulated consistently in all the patients of 3 fold in the tumors; and the second 50 genes consistently down regulated of 3 fold in all the patients.

Can anyone help me?

gene-expression simulation microarray • 2.6k views
ADD COMMENT
1
Entering edit mode

What kind of data are you hoping to end up with?

When you talk about microarray data, is that going to be fluorescence numbers or A/B ratios? There are quite a few different kinds of microarray.

If you want it to be expression data, maybe those are random numbers in the range 0-20. R has a lot of easy random number generators, and you can craft the matrix with those. Look into the family of Gamma distributions and choose some parameters that suit your expectations.

ADD REPLY
0
Entering edit mode

Hi again this is the code I generated

# create a matrix
x <- matrix(rnorm(1000*20, mean=10, sd=2), ncol=20)

#create a FC condition
conditionsFC <- rep(c(3,1/3),c(50,50))

# modify the first 100 genes
x[1:100,10:20] <- x[1:100,1:10]*conditionsFC

Do you think is correct?

ADD REPLY
1
Entering edit mode

Well it does appear to do what you asked for. In that sense it works. However, the perfect correlation and perfect 3-fold-change you have constructed is absolutely unrealistic. The simulation looks nothing like real data, so your subsequent experiments will be meaningless.

ADD REPLY
0
Entering edit mode

Hi,

Thank you Karl,

I added a bit of noise

x <- matrix(rnorm(1000*20, mean=10, sd=2), ncol=20)
conditionsFC <- rep(c(2,1/2),c(50,50))
conditionsFC_noise<- jitter(conditionsFC)
x[1:100,11:20] <- x[1:100,1:10]*conditionsFC_noise
ADD REPLY
0
Entering edit mode

Hi Karl and Michael,

Thank you very much. Karl, yes, I was referring to expression data that can be generated with one color microarray technology.

Michael, I don't want to see difference between means, I want differences that must be consistent in all patients. For example a differentially expressed gene should be a gene that increased ( or decreased) its expression of ~3 Fold in all the patients. If you consider the mean you can also get genes whose mean is changing between normal and tumor just because is changing in a subset of patients.

Thank you again

Pas

ADD REPLY
1
Entering edit mode
8.6 years ago
michael.ante ★ 3.8k

One way to tackle your problem is to use R and its random functions. With these, you can build your matrix gene-wise:

mdata=matrix(0,nrow=1000,ncol=20)
mdata[i,]=c(rnorm(n=10,mean=10,sd=5),rnorm(n=10,mean=30,sd=5))

You may also use sample/runif to get variations in your mean.

But as Karl wrote, it is hard to get the complexity of real microarray data.

Cheers,
Michael

ADD COMMENT

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6