Question

Problem with limma in microarray analysis

2

Entering edit mode

2.9 years ago

reza ▴ 300

hi every one

I have a problem with limma, I annotated microarray data and then I read it by following command in R

GSE23881_Sal <- read.delim("GSE23881_by_gene.txt", header = T, row.names = 1)

> head(GSE23881_Sal)
       GSM589082 GSM589083 GSM589084 GSM589085 GSM589086 GSM589087 GSM589088 GSM589089 GSM589090
A1CF      3.8789   3.82150   3.84570   4.05050   3.66320    3.6776   3.99520   3.81040    3.7640
A2ML2     3.9025   3.70645   3.88135   3.94815   3.79510    3.7922   3.89240   3.79505    3.9316
A2ML4     4.8197   4.53920   4.82955   4.74460   4.61515    4.6105   4.79785   4.72325    4.6393
A4GALT    4.8698   4.84260   4.87800   5.30540   5.25380    4.9064   5.05990   4.94090    5.0059
A4GNT     5.5963   5.47640   5.71940   5.51420   5.55970    5.4114   5.78610   5.71030    5.5237
AAAS      9.1472   8.87970   8.95340   9.03555   8.98115    8.9938   8.79250   9.06600    8.9517
       GSM589091 GSM589092 GSM589093 GSM589094 GSM589095 GSM589096 GSM589097 GSM589098 GSM589099
A1CF     3.69040   3.72630   3.78830    3.9579   3.71840    3.8525   3.61950   3.89360   3.73620
A2ML2    3.94755   3.69815   3.99395    3.6007   3.84380    4.0757   3.81345   3.96205   3.95620
A2ML4    4.64975   4.59095   4.76360    4.5900   4.79340    4.8722   4.74945   4.79270   4.86235
A4GALT   4.95290   4.89030   4.98270    4.7433   4.74330    5.0175   4.89530   5.03110   5.04090
A4GNT    5.68750   5.72410   5.45030    5.5001   5.51130    5.8292   5.63270   5.56560   5.52350
AAAS     9.11110   9.06480   9.05920    8.9122   9.11045    8.9326   9.07685   9.08885   8.97400
       GSM589100 GSM589101
A1CF     3.79720   3.93840
A2ML2    3.93245   4.10320
A2ML4    4.80005   5.04395
A4GALT   5.20450   5.18370
A4GNT    5.86780   5.67800
AAAS     9.03500   9.08680

when I go to find DEGs by following way, I got an error

data <- GSE23881_Sal - rowMeans(GSE23881_Sal)
data$Symbols <- factor(Samples)
design <- model.matrix(~ Symbols + 0, as.data.frame(data))
colnames(design) <- levels(factor(Samples))

fit <- lmFit(as.matrix(GSE23881_Sal), design)
Error in lmFit(as.matrix(GSE23881_Sal), design) : 
  row dimension of design doesn't match column dimension of data object

while

  > dim(GSE23881_Sal)
[1] 13780    20

> dim(design)
[1] 13780     2

Which part of my path is wrong?

limma microarray R • 1.3k views

ADD COMMENT • link 2.9 years ago by reza ▴ 300

score 0 · Answer 1 · 2021-05-31

0

Entering edit mode

2.9 years ago

Kevin Blighe 87k

Hi, why are you doing this?

data <- GSE23881_Sal - rowMeans(GSE23881_Sal)

What is in Samples? How did you obtain (retrieve or download) the data?

ADD COMMENT • link 2.9 years ago by Kevin Blighe 87k

0

Entering edit mode

I downloaded GSE23881 from GEO dataset as CEL files and after reading and normalizing of data, annotation of them was done (adding gene symbols instead of probe names).

Samples defined as below

Samples <- c(rep("Case", 1), rep("Control", 3), rep("Case", 2), rep("Control", 1), rep("Case", 3), rep("Control", 1), rep("Case", 1), rep("Control", 3), rep("Case", 5))

Regarding the first question, I have to say that it was a friend's suggestion to improve the results of the PCA. Isn't it needed?

ADD REPLY • link 2.9 years ago by reza ▴ 300

1

Entering edit mode

I see, regarding the line with rowMeans(), this will center your data around 0, which is 'yes' okay for PCA. This is the same as doing scale(x, center = TRUE, scale = FALSE). It is not necessary for differential expression analysis, though.

One problem is that you add your 'Samples' vector to your main expression data, and then later force this to numerical via as.matrix()

data <- GSE23881_Sal - rowMeans(GSE23881_Sal)
metadata <- data.frame(Samples = factor(Samples))
design <- model.matrix(~ 0 + Samples, metadata)
design
colnames(design)
colnames(design) <- levels(factor(Samples))

fit <- lmFit(t(GSE23881_Sal), design)
# then run `eBayes()` and perform your comparisons

Another key thing to check is that the order of the variables in Samples is equal to the order of columns in GSE23881_Sal

ADD REPLY • link 2.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Unfortunately, i got same error.

> fit <- lmFit(t(GSE23881_Sal), design)
Error in lmFit(t(GSE23881_Sal), design) : 
  row dimension of design doesn't match column dimension of data object
> dim(design)
[1] 20  2
> dim(GSE23881_Sal)
[1] 13780    20

in my main question dim of "design" and "GSE23881" was ok but now the situation is different. Anyway, thank you very much for trying to help me.

ADD REPLY • link 2.9 years ago by reza ▴ 300

1

Entering edit mode

Please try:

fit <- lmFit(GSE23881_Sal, design)