Shrink logFC values after edgeR
1
1
Entering edit mode
12 weeks ago
arvind.1 ▴ 10

Good morning,

I am new to RNA-seq data analysis. After performing edgeR , I got logFC values ranging from -20 to 20. Can I manually shrink log fold change (logFC) values by a factor of 0.5 (or any other factor) in the output obtained from edgeR? If not,can you provide me with some functions and packages that can solve this problem? I look forward to hearing from you. Thank you for your time and consideration.

logFC edgeR • 1.0k views
ADD COMMENT
2
Entering edit mode

Please check out the post here in Bioconductor. It has been very nicely explained what you could do or should not do.

ADD REPLY
0
Entering edit mode

Thank you for the reply

ADD REPLY
0
Entering edit mode

Thank you for your reply.

ADD REPLY
2
Entering edit mode
12 weeks ago
Gordon Smyth ★ 7.0k

Why is getting logFC=20 causing a "problem" for you? You would only get such a large logFC from edgeR if the counts are zero in one group and in the hundreds of thousands in the other. Getting a large logFC seems a fair representation in such cases.

Anyway, you can shrink the log-fold-changes by as much or as little as you want in edgeR. Just use a larger value of prior.count when you run glmFit (or glmQLFit). Or you can use the predFC function, which will give the same result as glmFit.

By default, the model fitting functions in edgeR are set to do light shrinkage of the logFCs by setting prior.count = 0.125. If you increase prior.count to 5, that will do quite heavy shrinkage. Note that shrinkage is not the same as simply multiplying the logFC by a scaling factor. Increasing prior.count has much more effect on the logFC for low count genes than on logFCs for highly expressed genes because the logFC for low count genes are less certain. Changing prior.count only affects the estimated logFCs. All the other output including test statistics and p-values remain the same.

The reason why the argument is called prior.count is because the shrinkage corresponds to a Bayesian model where the size of the counts in a prior dataset encapsulates the strength of the prior belief that the logFCs should be close to zero. Long ago (15 years ago) we toyed around with the idea of estimating an optimal value for the prior.count automatically from the data (it's a chapter in Belinda Phipson's thesis) but eventually decided it would be safer and better to use preset values in the public edgeR package.

To illustrate the use of prior.count, here is a toy example where edgeR does return logFC=20:

> counts <- matrix(c(0,1000,135000,2010),2,2)
> colnames(counts) <- c("A","B")
> rownames(counts) <- c("Gene1","Gene2")
> counts
         A      B
Gene1    0 135000
Gene2 1000   2010
> group <- c("A","B")
> design <- model.matrix(~group)
> fit <- glmFit(counts,design,dispersion=0,lib.size=rep(10e6,2))
> lrt <- glmLRT(fit)
> topTags(lrt)
Coefficient:  groupB 
        logFC  logCPM        LR     PValue        FDR
Gene1 20.0426 12.7207 187149.74 0.0000e+00 0.0000e+00
Gene2  1.0071  7.2355    345.57 3.9103e-77 3.9103e-77

Changing the default setting to prior.count=5 shrinks the first logFC considerably while leaving the second almost unaffected:

> fit <- glmFit(counts,design,dispersion=0,prior.count=5,lib.size=rep(10e6,2))
> lrt <- glmLRT(fit)
> topTags(lrt)
Coefficient:  groupB 
        logFC  logCPM        LR     PValue        FDR
Gene1 14.7207 12.7207 187149.74 0.0000e+00 0.0000e+00
Gene2  1.0036  7.2355    345.57 3.9103e-77 3.9103e-77
ADD COMMENT
3
Entering edit mode

we toyed around with the idea of estimating an optimal value for the prior.count automatically

Hi Gordon, DESeq2 has various options for automatic shrinkage. I particularly like the ashr option since it's compatible with contrasts matrices and requires only the estimates (logFC) and their standard errors. What's your opinion about this solution? Would you consider providing standard errors in the output of edgeR so that users can apply the ashr shrinkage?

I've come across this post Confidence intervals on edgeR logFC from few years ago and I wonder if there's any update on that.

ADD REPLY
2
Entering edit mode

No, I am not going to provide standard errors (SEs) from edgeR, for the same reasons that I explained in the previous post that you link to. SEs fail completely in the very situation when shrinkage is most needed.

If the counts are all zero in the one of the groups being compared, then both the logFC and the SE are infinite. How does that help with shrinkage? We showed nearly 20 years ago (Robinson & Smyth, 2008) that SEs from negative binomial glms are not even consistent. Imagine two genes, one for which the counts are 0 vs 10 and for another the counts are 0 vs 10 million. If you use SEs from standard glm code, then the logFCs for the second gene will be shrunk to a smaller absolute value than the first, even though it has much greater evidence for a large FC.

limma provides logFCs and SEs, so you could use ashr with limma. Matthew Stephens' work is always interesting and sophisticated but I would be asking you, what you do view the shrunk logFCs from ashr as representing and for what scientific purposes will you use them? Personally, I prefer to keep statistical significance and logFCs as separate concepts whereas ashr is trying to combine them.

If I wanted to do adaptive shrinkage of logFCs in edgeR, then I would use the method from Belinda Phipson's thesis, which I think is far superior to methods based on SEs.

Our work on logFC shrinkage showed us that optimal shrinkage depends very much on the model assumptions that you make and it is hard to verify these assumptions robustly from the data. Packages like ashr have no choice but to assume that the SEs are independent of the underlying true logFC, i.e., it seems to me that there is an implicit assumption of independent priors for the logFC and variance, but that assumption is not really tenable for RNA-seq data in my opinion. The example above with zero counts shows that larger true logFCs also tend to be estimated with less precision.

ADD REPLY
1
Entering edit mode

Matthew Stephens has himself input limma's logFCs and standard errors into ash: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906610/

ADD REPLY

Login before adding your answer.

Traffic: 1767 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6