best value of lfc threshold
1
1
Entering edit mode
6.3 years ago
rthapa ▴ 90

What is the best value to assign for lfc threshold while using DESeq2 package? With 1 as lfc threshold, I got more than 3000 upregulated genes. Any suggestion please? Thanks

RNA-Seq • 4.1k views
ADD COMMENT
6
Entering edit mode
6.3 years ago

In DESeq2, the 'lfc' values are on the log [base 2] scale (log2fc)..

This is an open-ended question. Ask 100 people and you'll get very different answers.

  • Log2fc of 1 is equivalent to linear fold change of 2
  • Log2fc of 2 is equivalent to linear fold change of 4
  • Log2fc of 3 is equivalent to linear fold change of 8

Each person appears to choose a cut-off value that relates to whatever the first trusted person in their careers told them. The mistake that these people then make is in rigidly adhering to this cut-off and in thinking that it's the only answer. In some cases, people do not even use any cut-off for fold-change and just use adjusted P-values (Q values) and then rank the statistically significant genes based on fold-change. As I recall, the first trusted voice in my own career told me: 'FDR Q<0.05 and absolute log2fC>2', but that was during a time when RNA-seq was not even available.

There really is no answer, though, and it depends on many factors, including:

  • The normalisation type (with the normalisation method(s) that produce FPKM/RPKM expression levels, unrealistically large log2fc values will be observed; with quantile or geometric normalisation, as used in DESeq2, log2fc values will be lower than with FPKM experssion levels, and will be balanced between negative and positive fold-changes)
  • how many genes you want to include for downstream analysis
  • previous literature of how many transcripts to expect in such a comparison that you're conducting
  • the adjusted P value that you are using for cut-off. For example, even at FDR Q<0.05 and log2fc=2, many of the transcripts will not be that much different when you visualise the normalised 'counts' between your comparisons (this comment only has validity in certain experimental setups though)
  • the variance of your data (high variance = unreliable log2fc values in any setting)

So, the message? - there is absolutely no standard cut-off. Use what is most appropriate for your data and what works best.

Kevin

ADD COMMENT
0
Entering edit mode

sorry, why correlation between two samples goes two times higher when I perform geometric normalisation on my row counts? Is there any explanation please? I calculated Pearson correlation for two samples before and after normalisation wherein correlation went higher 2 times in normalised samples

ADD REPLY
1
Entering edit mode

The correlation value may have changed, but does the statistical significance of the correlation change? Use cor.test to check.

A short answer, too: there are different normalisation methods out there and they will produce data on different distributions. It is logical that statistical inferences from different normalisations will also be different. What you must ensure is that you choose the normalisation strategy that is most suitable for your data.

ADD REPLY
0
Entering edit mode

you alright, I am facing with a data sets with too many zeros and genes with low read counts, in another hand dataset is heterogeneous of two dataset with different distributions.

ADD REPLY
1
Entering edit mode

In that case, you may consider (prior to normalisation) removing transcripts that have a high rate of zeros across your sample cohort

ADD REPLY

Login before adding your answer.

Traffic: 2423 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6