Biostar Beta. Not for public use.
Question: edger model matrix design of linear model
2
Entering edit mode

Hi,

I am trying to do differential expression based on RNA-seq data. I only have one variable, which is the timepoint. I want to analyze gene expression relative to the 0 h time point. I already set up the contrasts. Now, I am trying to set up the design matrix. What is the difference between

design <- model.matrix(~timepoint)

and

design <- model.matrix(~0 + timepoint).

There is a big difference in the results.

Entering edit mode
0

I don't see a difference. I could be wrong.

Why not try both and compare the results?

ADD REPLYlink 20 months ago
goodez
• 460
Entering edit mode
0

Hi, there is a big difference in the results. The results using the design <- model.matrix(~timepoint) makes much more biological sense.

ADD REPLYlink 20 months ago
teabonng
• 20
Entering edit mode
0

Yes I don't think model.matrix(~0 + timepoint) makes much sense. How can you use zero as a term in your model to explain differences?

ADD REPLYlink 20 months ago
goodez
• 460
4
Entering edit mode

The clue here is to look at the coefficients, or columns derived in the design matrix. Your non intercept model design, will create a coefficient that describes the difference between the two levels of your timepoint variable. Your non-intercept model will create a coefficient for each level. Here's an example using your non-intercept model:

library(tidyverse)
library(limma)
# Make a Phenotype Table
foo       <- data.frame(SampleType = paste0(rep(c("A","B"), each = 3)), 
                        Reps       = rep(1:3, 2)) %>%
             mutate(Names = paste0(SampleType,Reps))
# Make some fake gene expression data
genes.foo <- matrix(rnorm(6*100),ncol = 6) %>% 
             `rownames<-`(paste0("Gene_",1:100)) %>% 
             `colnames<-`(foo$Names)

#non-intercept model
model.foo <- model.matrix(~0 + SampleType, data = foo) %>% 
             `colnames<-`(gsub("SampleType","",colnames(.)))

model.foo
            A           B
1           1           0
2           1           0
3           1           0
4           0           1
5           0           1
6           0           1


conts.foo      <- c("A_Vs_B" = "A-B")
contmap.foo    <- makeContrasts(contrasts = conts.foo, levels = colnames(model.foo)) %>% 
                  `colnames<-`(names(conts.foo))
fit.foo     <- lmFit(genes.foo, model.foo) %>% 
               contrasts.fit(contmap.foo) %>% 
               eBayes


> fit.foo$coefficients %>% head
        Contrasts
             A_Vs_B
  Gene_1 -0.6722254
  Gene_2 -1.1571439
  Gene_3 -0.1021150
  Gene_4  0.8687436
  Gene_5 -0.9600460
  Gene_6  1.3139228

topTable(fit.foo, coef = "A_Vs_B", number = Inf, p.value = 0.05, lfc = log2(1.5))

Alternatively, here's your intercept model

model.foo <- model.matrix(~SampleType, data = foo)

fit.foo     <- lmFit(genes.foo, model.foo) %>% 
               eBayes

> model.foo
  (Intercept) SampleTypeB
1           1           0
2           1           0
3           1           0
4           1           1
5           1           1
6           1           1

# Here SampleTypeB is actually the difference relative to the first level (SampleTypeA)
> fit.foo$coefficients %>% head
       (Intercept) SampleTypeB
Gene_1  -0.2920748   0.6722254
Gene_2  -1.0356549   1.1571439
Gene_3   0.2477702   0.1021150
Gene_4   0.5016585  -0.8687436
Gene_5   0.3153554   0.9600460
Gene_6   0.2836491  -1.3139228

topTable(fit.foo, coef = "SampleTypeB", number = Inf, p.value = 0.05, lfc = log2(1.5))

edit: Changed the data frame initialisation.

ADD COMMENTlink 20 months ago andrew.j.skelton73 5.7k
Entering edit mode
0

in line 4, it should be: foo <- data.frame(SampleType = paste0(rep(c("A","B"), each = 3)), Reps = rep(1:3, 2), Names = paste0(rep(c("A","B"), each=3), rep(1:3, 2) ) )

set.seed() before generating the data will be helpful.

ADD REPLYlink 20 months ago
Zhilong Jia
♦ 1.4k
Entering edit mode
0

You're right, the data frame initialisation was wrong. I mostly wrote this as pseudocode and ran bits in R...I probably should have written that as a disclaimer. Setting the seed has benefits in reproducibility, but as it's proof of concept, and dummy data, it doesn't really matter here.

ADD REPLYlink 20 months ago
andrew.j.skelton73
5.7k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0