Biostar Beta. Not for public use.
Question: Different logFC (log2foldchange) values for genes from limma-voom and other tools (edgeR and DESeq2)
Entering edit mode

I used different tools to do differently expressed genes analysis. From the results I see a general difference in logFC values for genes from limma-voom, edgeR and DESeq2. Although some people saw difference in logFC between edgeR and EDSeq2 and got good answers, in my case edgeR and DESeq2 get very simmilar results but quite different from limma-voom.

I found that (not sure, i saw it somewhere)

limma's logFC =mean(log2(Group1))-mean(log2(Group2))

Actual logFC= log2(mean(Group1)/mean(Group2))

Can this small difference in famulars cause a general big difference in limma-voom with other tools(edgeR and DESeq2) ?

Which logFC is better to trust?

Below is a figure of the comparison of different logFCs

enter image description here

ADD COMMENTlink 2.4 years ago SMILE • 100 • updated 2.4 years ago Friederike 4.2k
Entering edit mode

Maybe I should put this in Bioconductor?

ADD COMMENTlink 2.4 years ago SMILE • 100
Entering edit mode

limma, edgeR and DESeq2 use different ways to estimate the read counts that they are using for the DE analysis as well as for the log2FC.

The basic steps of the DE tools are:

  1. normalization for differences in sequencing depth
  2. turning the very few read counts per gene per condition (usually only 2 or 3!) into values that will work with downstream statistical tests
  3. apply the statistical tests

The tools have different solutions for all three steps, but while edgeR and DESeq2 are (nowadays) somewhat similar, limma uses the voom transformation for step 2, which actually aims at changing a lot of the annoying properties of read count data (their paper is very informative). Therefore, it is not too surprising that the logFC values differ since the values that each tool ends up using for the statistical test are not the actual raw read counts.

As far as what you can trust more -- there's no good answer to that. All three values are estimates that have been shaped by the specific assumptions the tools make. Personally, I would not be concerned as long as the trends are the same, and after a quick glance at your example above it seems like the direction is generally the same. Just make sure to (i) note which logFC value you're going with and (ii) to use this type of value for all your samples and comparisons.

ADD COMMENTlink 2.4 years ago Friederike 4.2k
Entering edit mode

Btw, everything that is said in the post you link above of course still holds true.

Following the notion that Aaron put forth, you could additionally filter genes based on the discrepancies you see for the logFC. For example, you could remove all genes where the three tools cannot even decide on the direction of the change. This might help consolidate the list of DE genes to those that change most robustly, regardless of the assumptions the software makes.

ADD REPLYlink 2.4 years ago
Entering edit mode

Thank you for your advice Friederike! Yes, Aaron offer a more detailed explanation.

ADD REPLYlink 2.4 years ago
• 100

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0