Best way to address different batches of RNA-seq
0
0
Entering edit mode
7.0 years ago
tud55122 ▴ 20

I did two batches of RNA-seq upon same drug treatment but at different time points on the same cell line. For each time point, I have corresponding DMSO controls. All my Fold Change values are calculated over the corresponding DMSO control. I'm trying to trace the gene expression dynamics upon my drug treatment. I found that if I simply merge two datasets into one master table and plot the RPKM and Fold Change values, the numbers do not reflect the real dynamics I saw by q-PCR. For example: I knew one gene (and a few more) that can be induced highest upon 4-day treatment but the RPKM values and Fold Changes do not show the same trend. So my question is:

What's the best way to address this issue?

Appreciate your help.

Hang

sequence RNA-Seq RPKM Fold-Change batch-effect • 3.5k views
ADD COMMENT
1
Entering edit mode

Below are the scripts I used.

samples=read.table("Samples.txt",sep="\t",header=T)
counts = readDGE(samples$countf)$counts
noint = rownames(counts) %in% c("__no_feature","__ambiguous","__too_low_aQual"                       ,"__not_aligned","__alignment_not_unique","no_feature","ambiguous","too_low_aQua l","not_aligned","alignment_not_unique")
counts = counts[!noint,]
dim(counts)
cpms = cpm(counts)
keep = rowSums(cpms>0)>=1
counts = counts[keep,]
dim(counts)
d = DGEList(counts=counts, group=samples$condition)
d = calcNormFactors(d)
design <- model.matrix(~0+samples$condition)
d <- estimateGLMCommonDisp(d,design)
d <- estimateGLMTrendedDisp(d,design) Loading required package: splines
d <- estimateGLMTagwiseDisp(d,design)
fit <- glmFit(d,design)
et <- glmLRT(fit, contrast=c(-1,1,0,0,0,0,0,0,0,0))
tt = topTags(et, n=nrow(d),sort.by="none")
write.table(tt,"EdgeR/test.txt",sep="\t",row.names=T, quote=F)
ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

Ok, then perhaps you could try to add the batch into the model/design. Although if you processed the DMSO/drug samples at the same time, it shouldn't be very impactful.

ADD REPLY
0
Entering edit mode

Is your fold change also computed based on RPKM values ? RPKM is not a proper between sample normalization metric so I suggest that you try a different method such as edgeR/DESEQ.

ADD REPLY
0
Entering edit mode

Thanks for the reply. My fold change is computed based on counts values. I used edgeR to calculate counts and generated RPKM values.

ADD REPLY
0
Entering edit mode

What was different between the two batches? In addition, as said above you shouldn't use RPKM, use statistics as in DESeq2

ADD REPLY
0
Entering edit mode

As mentioned by @tud55122 and @Asaf you shouldn't use RPKM values. I have used DESeq2 and there you have option to use batch+ condition information to make comparisons. I am not aware is edgeR ihas similar options. If you have counts data then you can easily follow DESeq2 vignette to see how to do that. I use the 'vst' based normalization which is better than RPKM based normalization. Hope it helps.

ADD REPLY
0
Entering edit mode

Seems you have two timepoints? Two conditions? and then a number of replicates per condition/timepoint - could you describe exactly which samples were sequenced together?

ADD REPLY

Login before adding your answer.

Traffic: 2048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6