Edger Logcpm Calculation
2
1
Entering edit mode
10.2 years ago
koen.vdberge ▴ 30

Dear all,

I am currently researching the differences in RNA-Seq data analysis, comparing the two well known EdgeR and Voom methods. However, there is one thing I can not manage to reproduce, namely the logCPM value in the output of the LRT table of EdgeR, after analyzing a certain contrast or coefficient. I understand from the manual and helpfile, that this logCPM value is a log2 counts per million, normalized for library sizes. I also understand that it is not a simple mean that is being used, but rather the mglmOneGroup() function. However, when I try to recalculate this myself, I can not obtain the same value. My workflow in doing so looks like this:

#Calculate through the table
counts <- read.delim("file.txt")
Treat <- factor(rep(c(rep(c("Cont","DPN","OHT"),4)),each=2))[-19] #Delete removed sample 19 by authors

y.s <- DGEList(counts=counts.filtered.smart,group=Treat)
y.s <- calcNormFactors(y.s)
y.common <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y.trended <- estimateGLMTrendedDisp(y.common, design)
y.tagwise <- estimateGLMTagwiseDisp(y.trended, design,prior.df=22)
lrt <- glmLRT(fit,coef= c(5,6,9,8))
lrt$table$logCPM

#Calculate manually
cpmstest <- cpm(y, normalized.lib.sizes = TRUE)
LogCpmstest <- log(cpmstest + 0.5, base = 2)
mglmOneGroup(LogCpmstest) #not identical

mglmOneGroup(y.tagwise) #also non-identical

Has anyone got any recommendations in what should be changed in this manual calculation workflow? What am I doing wrong? Any help would be greatly appreciated.

Sincerely, Koen.

edger r • 12k views
ADD COMMENT
1
Entering edit mode
10.0 years ago

Yeah, I've noticed that the log ratio values in the result file can be quite different (including sign changes that would go beyond changes due to rounding values, using RPKM instead of CPM, etc). In general, I've noticed that edgeR can give funky results sometimes, and I suspect it is doing some over-fitting at some step. This is why I would prefer DESeq over edgeR (but I've also heard that limma-zoom is also a good option).

If it helps, here are some benchmarks comparing some popular differential expression algorithms:

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

ADD COMMENT
1
0
Entering edit mode

Thank you for adding this response to the user's question!

ADD REPLY

Login before adding your answer.

Traffic: 1500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6