Correcting batch effects between RPKM datasets?
1
4
Entering edit mode
8.8 years ago
JacobS ▴ 980

I have two datasets, one with ~250 sample, and another with 7 samples. Both datasets are of RPKM values computed from human RNA-Seq. I don't have access to the primary reads files.

Is there a good way to batch-correct these datasets so that I can combine them and scan for expression signatures? I'm currently using an algorithm that creates a geometric average of the RPKM values for groups of genes that belong in a specific signature in order to compare samples, but the RPKM values of the ~250 sample dataset are on average much higher than the 7 sample dataset.

I've used ComBat in the past for the same predicament but with microarray expression data, and it worked perfectly. I'm looking for something analogous for RPKM expression data.

DGE RNA-Seq batch RPKM • 4.3k views
ADD COMMENT
0
Entering edit mode

What are you going to do with the combined data? If you are going to do differential expression analysis, what are the groups?

ADD REPLY
2
Entering edit mode
8.7 years ago
Ying W ★ 4.2k

Have a look here.

Combat should still work for RNA-seq, it can be found in the SVA package.

You could also have a look at the following packages:

ADD COMMENT
0
Entering edit mode

Just note that batch effect correction is not always compatible with the experimental design and questions.

ADD REPLY
0
Entering edit mode

Hi,

Thanks. That helps. But I was wondering if the input data in Combat would be just the RPKM data or log2 of the RPKM data

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6