Combat normalization returns negative values
1
4
Entering edit mode
8.2 years ago
lahat.albert ▴ 60

I am trying to correct batch effect using combat.

About 40% of my genes ends up having at least one negative result. If just drop those genes the resulting normalized PCA plot clusters neatly (but I loose 40 % of genes):

I've tried turning them into zeros, but that makes a really bad PCA clustering (especially at the PC1):

Is there a way to not loose almost half the data bit without distorting it too much?

SVA RNA-Seq batch-correction R ComBat • 8.4k views
ADD COMMENT
2
Entering edit mode

Hi Lahat,

It would be useful to know littlle background of your samples, how you are using ComBat for batch normalization ; before and after boxplot of each sample.

Regards,
Mamun

ADD REPLY
0
Entering edit mode

Hi. the samples are mice RNAseq data from several treatments. The sequencing was done in three batches. two batches (1 and 2 was done using truseq method), and batch 3 was done using gencore method.

Without normalization there is a very strong batch effect between methodologies:

Here is the code:

dat = read.table('200genes/counts.count',header=TRUE,row.names=1)
sif = read.table('200genes/batches',header=TRUE,sep='\t')
batch = as.character(unlist(sif['Batch']))
modcombat = model.matrix(~1, data = dat)
dat = as.matrix(dat)
library(sva)

dat_filtred = (dat[(rowVars(dat)) > 0,]) #removes 0 variance
print('this are the top genes removed (they should be zero)')
head(sort(rowMeans(dat[!( rowVars(dat) > 0),]),decreasing=TRUE))
combat_edata = ComBat(dat=dat_filtred,batch=batch,mod=NULL,par.prior=TRUE,prior.plots=FALSE)
combat_edata = ifelse(combat_edata<0,0,combat_edata) # converts negative normalized into 0
write.table(combat_edata,'200genes/counts.NORM.count')
ADD REPLY
1
Entering edit mode
8.2 years ago

Hi There,

In RNASeq data some transcripts may have 0 FPKM/RPKM count. When you are using combat, have you already converted the data to log10 scale? Remember to add 1 before transforming the data back to log scale.

Not sure if ComBat is the best way to remove batch effect from RNASeq data. svaseq might be a better option.

Another approach :: Instead of correcting for the batch effect, why not include batch label as a factor in the design matrix in a multi-factorial analysis in DESeq or edgeR.

There is an elaborated discussion in this thread regarding this issue.

Hope this helps.

Mamun

ADD COMMENT

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6