I wish to raise a question regarding the detection of differentially expressed genes from combined datasets. So as to merge two expression microarray datasets, following the combination of series matrices and their corresponding annotation files, I used the ComBat function in sva package to remove the batch effect. However, as a beginner bioinformatician, I have almost no idea how to take advantage of the retrieved output of ComBat function of sva package in the Limma package in order to find differentially expressed genes. Therefore, some errors occur usually while I am trying to do so. Accordingly, I pasted an excerpt from my codes in the following to gain your precious help in providing me easy-to-do R codes for approaching my purpose.
If you have time to guide me I will be immensely grateful.
I am looking forward to your response.
Sincerely yours,
Amir Mehrgou
An excerpt from my code:
library(sva)
library(limma)
allc <- ComBat(as.matrix(all), batch = batch)
allm <- allc - rowMeans(allc)
allm$description <- factor(gr)
design <- model.matrix(~ description + 0, as.data.frame(allm))
colnames(design) <- levels(factor(gr))
fit <- lmFit(as.matrix(allm), design)
cont.matrix <- makeContrasts(Case-Control, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=Inf)
Regardless of the tool used to remove batch effect, I believe the general recommendation is to incorporate the batch as a covariate in the model, and only batch-adjust your matrix for data plotting/exploration.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.I was not aware of this method of code writing. Thank you for your guidance.
Why are you using ComBat when limma has a removeBatchEffect function?
As you suggested I utilized removeBatchEffect function in limma package. However, at the time of finding differentially expressed genes I encountered the same errors as I had observed when sva package was used. My codes and errors are as follow:
Additionally, I made a comparison between removing batch effect by limma (Left side) and sva (Right side) packages the plot of which is attached below.
Which one is more powerful do you think?
Sincerely yours,
Amir
Like Papyrus says, account for batch as a covariate. Removing batch effects is not a good idea, especially if the experiment design is unbalanced.