Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

Problem with limma in microarray analysis
1
1
Entering edit mode
13 days ago
reza ▴ 260

hi every one

I have a problem with limma, I annotated microarray data and then I read it by following command in R

GSE23881_Sal <- read.delim("GSE23881_by_gene.txt", header = T, row.names = 1)

GSM589082 GSM589083 GSM589084 GSM589085 GSM589086 GSM589087 GSM589088 GSM589089 GSM589090
A1CF      3.8789   3.82150   3.84570   4.05050   3.66320    3.6776   3.99520   3.81040    3.7640
A2ML2     3.9025   3.70645   3.88135   3.94815   3.79510    3.7922   3.89240   3.79505    3.9316
A2ML4     4.8197   4.53920   4.82955   4.74460   4.61515    4.6105   4.79785   4.72325    4.6393
A4GALT    4.8698   4.84260   4.87800   5.30540   5.25380    4.9064   5.05990   4.94090    5.0059
A4GNT     5.5963   5.47640   5.71940   5.51420   5.55970    5.4114   5.78610   5.71030    5.5237
AAAS      9.1472   8.87970   8.95340   9.03555   8.98115    8.9938   8.79250   9.06600    8.9517
GSM589091 GSM589092 GSM589093 GSM589094 GSM589095 GSM589096 GSM589097 GSM589098 GSM589099
A1CF     3.69040   3.72630   3.78830    3.9579   3.71840    3.8525   3.61950   3.89360   3.73620
A2ML2    3.94755   3.69815   3.99395    3.6007   3.84380    4.0757   3.81345   3.96205   3.95620
A2ML4    4.64975   4.59095   4.76360    4.5900   4.79340    4.8722   4.74945   4.79270   4.86235
A4GALT   4.95290   4.89030   4.98270    4.7433   4.74330    5.0175   4.89530   5.03110   5.04090
A4GNT    5.68750   5.72410   5.45030    5.5001   5.51130    5.8292   5.63270   5.56560   5.52350
AAAS     9.11110   9.06480   9.05920    8.9122   9.11045    8.9326   9.07685   9.08885   8.97400
GSM589100 GSM589101
A1CF     3.79720   3.93840
A2ML2    3.93245   4.10320
A2ML4    4.80005   5.04395
A4GALT   5.20450   5.18370
A4GNT    5.86780   5.67800
AAAS     9.03500   9.08680


when I go to find DEGs by following way, I got an error

data <- GSE23881_Sal - rowMeans(GSE23881_Sal)
data\$Symbols <- factor(Samples)
design <- model.matrix(~ Symbols + 0, as.data.frame(data))
colnames(design) <- levels(factor(Samples))

fit <- lmFit(as.matrix(GSE23881_Sal), design)
Error in lmFit(as.matrix(GSE23881_Sal), design) :
row dimension of design doesn't match column dimension of data object


while

  > dim(GSE23881_Sal)
[1] 13780    20

> dim(design)
[1] 13780     2


Which part of my path is wrong?

limma microarray R • 135 views
0
Entering edit mode
13 days ago

Hi, why are you doing this?

data <- GSE23881_Sal - rowMeans(GSE23881_Sal)


What is in Samples? How did you obtain (retrieve or download) the data?

0
Entering edit mode

I downloaded GSE23881 from GEO dataset as CEL files and after reading and normalizing of data, annotation of them was done (adding gene symbols instead of probe names).

Samples defined as below

Samples <- c(rep("Case", 1), rep("Control", 3), rep("Case", 2), rep("Control", 1), rep("Case", 3), rep("Control", 1), rep("Case", 1), rep("Control", 3), rep("Case", 5))


Regarding the first question, I have to say that it was a friend's suggestion to improve the results of the PCA. Isn't it needed?

0
Entering edit mode

I see, regarding the line with rowMeans(), this will center your data around 0, which is 'yes' okay for PCA. This is the same as doing scale(x, center = TRUE, scale = FALSE). It is not necessary for differential expression analysis, though.

One problem is that you add your 'Samples' vector to your main expression data, and then later force this to numerical via as.matrix()

data <- GSE23881_Sal - rowMeans(GSE23881_Sal)
metadata <- data.frame(Samples = factor(Samples))
design <- model.matrix(~ 0 + Samples, metadata)
design
colnames(design)
colnames(design) <- levels(factor(Samples))

fit <- lmFit(t(GSE23881_Sal), design)
# then run eBayes() and perform your comparisons


Another key thing to check is that the order of the variables in Samples is equal to the order of columns in GSE23881_Sal