Correcting for batch effect in RNA-seq data
1
2
Entering edit mode
4.8 years ago
Rimma ▴ 30

I used DESeq2 to process RNA-seq data from different sources. And I found harsh batch effect when plotted PCA (different shapes of the figures represent 3 different batches, for example, ctr and PH.7d from different batches cluster apart):

enter image description here

I tried to remove it using limma package as described here:

colData
      sample   condition batch
1         100       PH.7d     1
..........
7          75         ctr     1
8  SRR5035380 hblast.10.5     2
..........
25 SRR5035397 hblast.18.5     2
26 SRR8437299         ctr     3
..........
37 SRR8437324       PH.7d     3

vsd<-vst(dds)
assay(vsd)<-limma::removeBatchEffect(assay(vsd),vsd$data1)
data2<-plotPCA(vsd, intgroup=c('condition','batch'),returnData=T)
data2<-as.data.frame(data2)
percentVar<- round(100*attr(data2,'percentVar'))
plot2<-qplot(PC1,PC2,color=condition,shape=batch,data=data2)

However, there is no changes when I plot the results:

enter image description here

What am I doing wrong?

Also, I tried to remove batch effect using design in DESeq:

ddsB=DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~batch+condition)

I'm getting this error:

Error in checkFullRank(modelMatrix) : 
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

Can somebody help me to solve it?

Thanks in advance!

RNA-Seq batch-effect • 3.9k views
ADD COMMENT
1
Entering edit mode

Are you sure that vsd$data1 corresponds to the vector encoding the batch variable? Seems to me it should be vsd$batch.

ADD REPLY
0
Entering edit mode

It looks like batch 2 doesn't contain any of the groups in batch 1 and 3, therefore it is not possible to correct for that batch. Are you sure there is at least one overlapping group in batch 2, that is also found in batch 1 and 3?

ADD REPLY
1
Entering edit mode
4.7 years ago
ATpoint 81k

RNA-seq is strongly confounded by the kit and library preparation method from what I've seen. The confounding effect kight dominate the biological variability. The confounding effect probably dominates any kind of biological differences, see here for example a PCA that I made from five independent data sources, processed identically from the in silico side.

Edit: Check if correct use of batch removal attempts as Benn says below can limit the confounding effect.

enter image description here

ADD COMMENT
0
Entering edit mode

But you can correct for batch when there are overlapping groups. However, I suspect that OP's batch 2 doesn't have any overlapping group...

ADD REPLY
0
Entering edit mode

True, but to what extend. Do you have experience on how well this works. I mean "mild" batch effects like different culture conditions in the lab, samples taken on different days or different sequencing protocols might be correctable, but can you really "regress" out the effect of different kits and laboratories?

ADD REPLY
0
Entering edit mode

I have only experience with removeBatchEffect() from edgeR/limma, they work fine, especially for visualization. Clearly the limma::removeBatchEffect code from OP did not work properly. Like Friederike is already suspecting.

ADD REPLY

Login before adding your answer.

Traffic: 3404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6