I'm a little confused by some of the documentation for R package GSVA.
Here's the sample analysis provided in the documentation:
library(GSVA)
library(limma)
p <- 10 ## number of genes
n <- 30 ## number of samples
nGrp1 <- 15 ## number of samples in group 1
nGrp2 <- n - nGrp1 ## number of samples in group 2
## consider three disjoint gene sets
geneSets <- list(set1=paste("g", 1:3, sep=""),
set2=paste("g", 4:6, sep=""),
set3=paste("g", 7:10, sep=""))
## sample data from a normal distribution with mean 0 and st.dev. 1
y <- matrix(rnorm(n*p), nrow=p, ncol=n,
dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep="")))
## genes in set1 are expressed at higher levels in the last 10 samples
y[geneSets$set1, (nGrp1+1):n] <- y[geneSets$set1, (nGrp1+1):n] + 2
## build design matrix
design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2)))
## fit linear model
fit <- lmFit(y, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## genes in set1 are differentially expressed
topTable(fit, coef="sampleGroup2vs1")
## estimate GSVA enrichment scores for the three sets
gsva_es <- gsva(y, geneSets, mx.diff=1)$es.obs
## fit the same linear model now to the GSVA enrichment scores
fit <- lmFit(gsva_es, design)
## estimate moderated t-statistics
fit <- eBayes(fit)
## set1 is differentially expressed
topTable(fit, coef="sampleGroup2vs1")
I'm somewhat confused by the design variable:
>design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2)))
> design
sampleGroup1 sampleGroup2vs1
[1,] 1 0
[2,] 1 0
[3,] 1 0
[4,] 1 0
[5,] 1 0
[6,] 1 0
[7,] 1 0
[8,] 1 0
[9,] 1 0
[10,] 1 0
[11,] 1 0
[12,] 1 0
[13,] 1 0
[14,] 1 0
[15,] 1 0
[16,] 1 1
[17,] 1 1
[18,] 1 1
[19,] 1 1
[20,] 1 1
[21,] 1 1
[22,] 1 1
[23,] 1 1
[24,] 1 1
[25,] 1 1
[26,] 1 1
[27,] 1 1
[28,] 1 1
[29,] 1 1
[30,] 1 1
If I understand the documentation correctly, then the sample group 1 and 2 should be mutually exclusive. However, sample group 1 includes all samples. Is this supposed to be the intercept?
Also, another question concerning multiple comparisons. Say I wanted to add a third exclusive group, and perform all pairwise comparisons where 1vs2, 1vs3, 2vs3.
1) How should I set up the design for this? Do I just add another grouping category to the design?:
design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2), rep(2,nGrp3)))
But then how do I specify result for all possible post-hoc comparisons?
2) When I report the enriched pathways for a particular coefficient (e.g., 1vs2), are the reported adjusted-pvalues corrected test or group-wise? In theory, should I aggregate all of the pvalues for all comparisons, then do FDR or Bonferroni multiple testing corrections?