edgeR Pvalue/FDR results
0
0
Entering edit mode
6.4 years ago
Sharon ▴ 600

I noticed some pvalues/FDR like the following:

gene   logFC                   logCPM                   Pvalue                  FDR
gene1 8.78610478309506  5.55934275062716    6.98629850992379e-110 1.33333507061895e-105 
gene2 8.34642639375796  7.09746407505299    4.20221256332387e-89    4.0099613385518e-85

Are these normal? Those genes counts seem right. But the pvalues?

RNA-Seq • 3.4k views
ADD COMMENT
0
Entering edit mode

salmon does not give such values, as far as I know, you must be plugging in the output of salmon into some DE tool and performing differential expression analysis which gives you logFC for the condition you are testing, and the rest metrics.

ADD REPLY
0
Entering edit mode

Sorry, edited. I mean edgeR after Salmon

ADD REPLY
0
Entering edit mode

I'm going to take a guess that your sample numbers are low, or the groups that you're comparing are unbalanced, e.g., comparing 50 samples versus 3. You will obtain unreliable P values in both of these situations.

ADD REPLY
1
Entering edit mode

Unbalanced is still ok but 50 vs 3 totally makes me sad. Mean-variancre fit doesn't really work with such unbalanced design. Tbh biologists need to understand this as well and such designs are only good for exploratory analysis rather than confirmatory ones. However when you say p-value are they FDR corrected or inital p-values?

ADD REPLY
0
Entering edit mode

I am comparing 10 samples control vs 10 samples tumors? what do you mean by unbalanced? For example gene1 above with -105 pvalue has counts less than 20 in each control sample and counts >2000 in each tumor control. How do you think?

ADD REPLY
1
Entering edit mode

A study size of 20 is very low, and, from my perspective, helps to explain the very low P values (but does'nt confirm that it's the sole issue). To give you an idea of why this happens:

Having just 20 samples will not give a global / 'holistic' representation of the disease/condition that you are studying. With such low numbers, there exists high probability that you will observe many transcripts that are entirely lowly expressed in one group and highly expressed in the other. These will be assigned very low P values, and rightly so. However, if you had 20,000 samples, then you would have a much greater representation of expression profiles and your P values would be more 'normal', in both the human interpretable sense of normality and also the statistical sense of normality, i.e., in a well-powered study, all P values from differential expression would line up nicely on a Quantile-Quantile plot.

Ideally there should have been some power analysis done prior to your study in order to determine ideal sample numbers (vchris alludes to study design in his/her comment above).

The only situation in where I would expect extremely low P values like these in a well-powered study would be in a gene knockout situation. However, even then, due to the way that expression data is normalised, even in those situations a gene knockout's statistical significance may not be what was expected.

Just to be sure, could you also plot a histogram of your normalised and then logged counts?

ADD REPLY
0
Entering edit mode

Ok, I will double check this and get back, thanks Kevin.

ADD REPLY
0
Entering edit mode

initial pvalues, but FDR is also very low.

ADD REPLY
0
Entering edit mode

Can you post a histogram of your P-values?

ADD REPLY
0
Entering edit mode

Not sure if this looks okay

https://ibb.co/npMqL6

ADD REPLY
1
Entering edit mode

It looks unusual - not normal. I have responded further above.

ADD REPLY

Login before adding your answer.

Traffic: 2052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6