Question

Edger Logcpm Calculation

1

Entering edit mode

10.2 years ago

koen.vdberge ▴ 30

Dear all,

I am currently researching the differences in RNA-Seq data analysis, comparing the two well known EdgeR and Voom methods. However, there is one thing I can not manage to reproduce, namely the logCPM value in the output of the LRT table of EdgeR, after analyzing a certain contrast or coefficient. I understand from the manual and helpfile, that this logCPM value is a log2 counts per million, normalized for library sizes. I also understand that it is not a simple mean that is being used, but rather the mglmOneGroup() function. However, when I try to recalculate this myself, I can not obtain the same value. My workflow in doing so looks like this:

#Calculate through the table
counts <- read.delim("file.txt")
Treat <- factor(rep(c(rep(c("Cont","DPN","OHT"),4)),each=2))[-19] #Delete removed sample 19 by authors

y.s <- DGEList(counts=counts.filtered.smart,group=Treat)
y.s <- calcNormFactors(y.s)
y.common <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y.trended <- estimateGLMTrendedDisp(y.common, design)
y.tagwise <- estimateGLMTagwiseDisp(y.trended, design,prior.df=22)
lrt <- glmLRT(fit,coef= c(5,6,9,8))
lrt$table$logCPM

#Calculate manually
cpmstest <- cpm(y, normalized.lib.sizes = TRUE)
LogCpmstest <- log(cpmstest + 0.5, base = 2)
mglmOneGroup(LogCpmstest) #not identical

mglmOneGroup(y.tagwise) #also non-identical

Has anyone got any recommendations in what should be changed in this manual calculation workflow? What am I doing wrong? Any help would be greatly appreciated.

Sincerely, Koen.

edger r • 12k views

ADD COMMENT • link updated 5.1 years ago by Gordon Smyth ★ 7.0k • written 10.2 years ago by koen.vdberge ▴ 30

Ram · Answer 1 · 2014-05-02

Yeah, I've noticed that the log ratio values in the result file can be quite different (including sign changes that would go beyond changes due to rounding values, using RPKM instead of CPM, etc). In general, I've noticed that edgeR can give funky results sometimes, and I suspect it is doing some over-fitting at some step. This is why I would prefer DESeq over edgeR (but I've also heard that limma-zoom is also a good option).

If it helps, here are some benchmarks comparing some popular differential expression algorithms:

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

score 1 · Answer 2 · 2019-03-24

1

Entering edit mode

5.1 years ago

Gordon Smyth ★ 7.0k

https://hypatia.math.ethz.ch/pipermail/bioconductor/2014-February/057449.html

ADD COMMENT • link 5.1 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

Thank you for adding this response to the user's question!

ADD REPLY • link 5.1 years ago by Charles Warden 8.2k