Question

How to use log2FoldChange in RNA-seq DE analysis

3

Entering edit mode

6.9 years ago

statfa ▴ 760

I apologize if my question is simple or may have been answered on this site. I tried to find some info but I couldn't find what I need.

1- I know how log2FC is calculated. What i don't know is that why when I use different models to obtain DEGs, such as edgeR, DESeq2, etc. I'm given different FCs for the same genes. If the method of calculating log2FC is the same, what makes the models to show different results? Is that due to different normalization methods they use which results in different read counts?

2- Now, suppose you have 4 different conditions and therefore you have 3 log2FCs. How do you use them to filter more genes after you have selected a number of genes based on their P-values? I know you can use different Criteria like |log2FC|>1. What I'm talking about is that this method is used when you have two different conditions and subsequently, one log2FC. What if you have three log2FCs? do you analyze each one of them separately and if at least one of them matches the criteria, you select the gene as DE?

3- Now, Assume that you have two conditions as follows: Treatment A and Treatment B. You find the DE genes in Treatment A and DE genes in Treatment B. You wanna find out what genes are exclusively DE in each Treatment Condition and what DE genes are common in both Treatments. If you use any statistical method, there is always a chance of false discovery. How do you compare the DE genes? What criteria would you define? Would you, for example, say that genes under FDR<0.05 are considered as DE? And then continue with your analysis?

4- Is it necessary to filter genes by the log2FCs? Or is p-value enough when you are going to analyze your data as mentioned in question number 4?

FC log2FoldChange RNA-seq • 4.6k views

ADD COMMENT • link updated 6.9 years ago by Kristoffer Vitting-Seerup ★ 4.0k • written 6.9 years ago by statfa ▴ 760

score 1 · Answer 1 · 2017-06-16

That was a lot of questions. Let me give it a try:

1: This is due to two things: Of less interest is that the tools might use different pseudo counts. The major difference is that DESeq2 also uses bayesian information sharing in the estimation of the Log2FC to get more robust results. You can find more information in the papers.

2: There is no answer to that question. It all depends on what you want. The alternative is to instead of first testing whether log2FC is different from 0 and afterwards applying a log2FC cutoff of 1 you can with DESeq2 directly test against abs(log2FC) >1. See the the DESeq2 vignette for more information.

3: I would use FDR < 0.05. In the interection between such two list there would only be 0.05^2 = 0.0025 chance of getting a false discorvery.

4: That is completely up to you. The cutoff on log2FC are usually done to ensure there is a minimum effect size which is especially important if you have large sample sizes (many biological replicates) but not in any strict sense required.