Question

Multiple group model matrix in rna-seq DE analysis

0

Entering edit mode

6.8 years ago

andrea.rubiop88 • 0

I have RNA-seq data of four different tissues and I need to make differential expression analysis of each against the rest of them. I am using R package pachterlab/sleuth to make the differential expression tests (Wald test). The sample to covariates table is something like this:

 sample    tissue  
 S10_Lu_A      Lung 
 S11_Lu_B      Lung 
 S13_SP_A    Spleen 
 S14_SP_B    Spleen 
 S15_SP_C    Spleen 
  S4_BM_A        BM  
  S5_BM_B        BM  
  S6_BM_C        BM  
  S7_In_A Intestine  
  S8_In_B Intestine  
  S9_In_C Intestine

My first approach was assigning a dummy factor to produce the model matrix I wanted, so if for example I wanted to test BM against the rest I got something like this:

  sample    tissue contrast
S10_Lu_A      Lung        A
S11_Lu_B      Lung        A
S13_SP_A    Spleen        A
S14_SP_B    Spleen        A
S15_SP_C    Spleen        A
 S7_In_A Intestine        A
 S8_In_B Intestine        A
 S9_In_C Intestine        A
 S4_BM_A        BM        B
 S5_BM_B        BM        B
 S6_BM_C        BM        B

And then I would simply use model.matrix(~ contrast) and perform the test on contrastB. However, I have been told this is incorrect as I need to take into account the tissue each sample comes from instead of obtaining a global mean of A.

My problem is that if I use ~ tissue to keep the individual tissues I lose the BM group in the intercept, and of course I can't add the contrast dummy as a covariate because it wouldn't be independent of the tissue. Maybe this is stupid, but I can't figure out how to model the contrast: BM vs. (lung + intestine + spleen) now.

I thought of using model.matrix( ~ tissue - 1) so I conserve all the individual groups, but then I don't get statistical significance of the contrasts.

I'm aware the simplest solution would be to rename BM to something else, as the reason it is being chosen as "default" is because it is the first alphabetical group, but I don't know if that is correct either or if I am missing some important statistical point. Can anyone shed some light on this please?

RNA-Seq model-matrix differential-expression • 2.9k views

ADD COMMENT • link updated 6.8 years ago by Devon Ryan 104k • written 6.8 years ago by andrea.rubiop88 • 0

1

Entering edit mode

I've not used sleuth before, but for limma the following design would work:

modelFormula <- formula(~ 0 + tissue, data = design)
modelMatrix <- model.matrix(modelFormula, data = design)
colnames(modelMatrix) <- c("BM", "Intestine", "Lung", "Spleen")
makeContrasts(BM-Intestine, BM-Lung, BM-Spleen, levels = c("BM", "Intestine", "Lung", "Spleen"))

ADD REPLY • link 6.8 years ago by James Ashmore ★ 3.4k

score 1 · Answer 1 · 2017-07-18

You can normally specify that you don't want an intercept:

model.matrix(~0 + tissue)

You could then specify a contrast vector of:

c(1, -1/3, -1/3, -1/3)

That will correspond to BM - (Intestine + Lung + Spleen)/3, which seems to be what you want. I don't know the exact command for that in Sleuth, since I don't use it, but in general that will be your strategy. Normally you fit the model and specify contrasts in separate commands.