Question

Are P-Values Obtained From Two Separate Analyses On The Same Population Comparable?

1

Entering edit mode

12.0 years ago

Damian Kao 16k

Are p-values obtained from two separate enrichment analysis on the same population of genes comparable?

For example, let's say I have two differentially expressed gene lists from the same population of genes. ListA is enriched for cell cycle with a p-value of 0.01, listB is enriched for cell cycle with a p-value of 0.001.

Would it be correct to say cell cycle is more significantly enriched in listB than listA? Are the p-values comparable?

enrichment gene-ontology • 3.4k views

ADD COMMENT • link updated 12.0 years ago by Bill Pearson ★ 1.0k • written 12.0 years ago by Damian Kao 16k

0

Entering edit mode

I would say It is correct if you generate listA and listB following indipendent hypotheses and use the same statistics to evaluarte enrichment

ADD REPLY • link 12.0 years ago by ff.cc.cc ★ 1.3k

score 2 · Answer 1 · 2012-04-24

2

Entering edit mode

12.0 years ago

tiagoantao ▴ 690

I would apply some sort of multi-test correction before comparing multiple lists (especially if they are more than a handfull). The usual caveats for gene enrichment apply: bonferroni too conservative, most FDRs probably also: check David EASE score for an alternative (not really multi-test correction).

Check maybe Huang, DW; Sherman, BT; Lempicki, RA (2009). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1):1-13

ADD COMMENT • link 12.0 years ago by tiagoantao ▴ 690

0

Entering edit mode

Thanks for the useful reference. I agree with you about too conservative multi-test corrections, and even if It does not seem to be the case with Dk lists, I suggest data splitting techniques and a critical approach (e.g. 'Improving Validation Practices in “Omics” Research' http://www.sciencemag.org/content/334/6060/1230.full)

ADD REPLY • link 12.0 years ago by ff.cc.cc ★ 1.3k

score 2 · Answer 2 · 2012-04-24

One important thing to keep in mind is that the p-value is not a quality measure. It is simply a measure of likelihood of observing the measure by accident considering a certain data. Therefore the underlying data' properties (in your case the number of GO terms that could be used factor in here as well) are the ones that determine the p-value and it is not a characteristic of the final observation.

IMO the purpose of the p-value is to accept the selection or reject it. In general I don't think it should be used to rank anything (though in reality just about everyone does it all the time). We (me included) tend to rank by p-value when we run out of options.

I would try to find a different measure/attribute to rank my genes and avoid comparing the p-values.

score 1 · Answer 3 · 2012-04-24

I think the answer is yes. That is, as long as these two lists represent an analysis of the same experiment. When you say "the same population of genes" this seems to imply a single data set (from a single experiment) representing some "universe" of genes - e.g. all the mouse genes represented on an array, some of which can be classified as cell cycle genes. Given that a p-value represents a fractional area under a curve, since listB takes up a tenth of the area of listA, I would call this more significant - even though the curves (or the analysis process that generated them, which you haven't explicitly stated) may be different shapes.

score 1 · Answer 4 · 2012-04-24

1

Entering edit mode

12.0 years ago

Bill Pearson ★ 1.0k

I agree with Istvan. You can say that one p-value is more significant than the other, but you CANNOT say that they are significantly different. That requires a different test on the hypothesis that the fold-change for the two genes is different.

ADD COMMENT • link 12.0 years ago by Bill Pearson ★ 1.0k