Hello everyone,
I am working on a RNA-seq project that includes samples from normal tissue, adenomas, adenocarcinomas, liver metastases and was done in five batches. As I have previously read, it is best if batch effect is included in the design matrix for linear model fitting- shown in limma guide and (http://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html). However, in the latter article, when they write contrasts, the lanes are not included, not accounted for if i understand correctly. A similar story is shown in limma guide chapter 17.1.5, where the DEGs are for the final coefficient.
I performed the analysis in two different ways, the first as such:
`group2<- factor(c("N","A","N","A","N","A","N","A","N","A",
"C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N",
"C","L","N","C","L","N","C","L","N","C","C","C", "N","L","N","C","L","N","C","L","N","C","L","N","C","L","N","L","N","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","N","C","L",
"N","A","N","A","N","A","N","A","N","A","N","A","N","A","N","A"),levels = c("N","A","C","L"))
batch2 <- factor(c(1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5))
design <- model.matrix(~0+group2+batch2)
cont.matrix <- makeContrasts(NvsA=N-A,
NvsC=N-C,
NvsL=N-L,
AvsC=A-C,
AvsL=A-L,
CvsL=C-L,
levels=colnames(design))`
The contrast matrix is then
Contrasts
Levels NvsA NvsC NvsL AvsC AvsL CvsL
N 1 1 1 0 0 0
A -1 0 0 1 1 0
C 0 -1 0 -1 0 1
L 0 0 -1 0 -1 -1
L2 0 0 0 0 0 0
L3 0 0 0 0 0 0
L4 0 0 0 0 0 0
L5 0 0 0 0 0 0
and the results include DEGs for NvsA NvsC NvsL AvsC AvsL CvsL. If I understand correctly, this does not account for batch effect and is similar to what the rnaseq-123 article from the link above did?
The other option is similar to the answer to this question https://support.bioconductor.org/p/69374/. I used an interaction type design and averaged the cases from which batch each sample came from and only included the interactions that had a sample inside (otherwise linear model did not work as coefficients were not calculable):
designy1 <- model.matrix(~0+group2:batch2)
colnames(designy1)
colnames(designy1) <- c("N1","A1","C1","L1","N2","A2","C2","L2","N3","A3","C3","L3","N4","A4","C4","L4","N5","A5","C5","L5")
designy2 <- designy1[,c(1:2,5,7,9,11:13,15:18)]
The contrast matrix then looked like this
contr.matrixy <- makeContrasts(NC= (N1+N2+N3+N4+N5)/5-(C2+C3+C4)/3,
nVSa= (N1+N2+N3+N4+N5)/5-(A1+A5)/2,
AVSC= (A1+A5)/2-(C2+C3+C4)/3,
NVL= (N1+N2+N3+N4+N5)/5-(L3+L4)/2,
CVL= (C2+C3+C4)/3-(L3+L4)/2,
AVL=(A1+A5)/2-(L3+L4)/2,
levels=colnames(designy2))
I then as standard used voom normalisation, linear model fitting, etc to get to DEGs.
vy <- voom(countsbindlist4I,design = designy2, plot=TRUE)
fity <- lmFit(vy, designy2)
fit2y <- contrasts.fit(fity, contr.matrixy)
efity <- eBayes(fit2y)
dt<- decideTests(efity)
summary(dt)
Is any of the solutions I used okay, or did I miss the mark with the second one? Thank you all for your help and insights.
Best regards,