Question

Fisher's exact test gives p-value 0

0

Entering edit mode

7.9 years ago

Adrian Pelin ★ 2.6k

Hello,

I have a similar situation described in this post Hypergeometric Test On Gene Set

I have 2 microarrays on 2 different conditions which give me 2 different gene sets of differential expressed transcripts.

Diff in Condition 1: 738

Diff in Condition 2: 1090

Overlap Condition 1 & 2: 453

Total Genes in array: 30941

I want to test the significance of the overlap between the 2 conditions. I use:

phyper(452, 738, 30203, 1090, lower.tail=FALSE)

[1] 0

Any idea why the p-value is 0? I tried based on this post "http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c"

phyper=(overlap,list1,PopSize-list1,list2,lower.tail = FALSE)

Thanks

enrichment R fisher's exact test • 6.4k views

ADD COMMENT • link 7.9 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

You should try using log=TRUE

ADD REPLY • link 7.9 years ago by russhh 5.7k

0

Entering edit mode

I get:

phyper(452, 738, 30203, 1090, lower.tail=FALSE, log.p = TRUE) [1] -1140.21

Any idea what what means? p.value = 1E-1140 ?

ADD REPLY • link 7.9 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

e^-1140.21, since log is natural log here.

ADD REPLY • link 7.9 years ago by Devon Ryan 104k

0

Entering edit mode

That number is still 0 when using any calculator. My question is, why is the p-value so low? The overlap is not that great, it is ~50-70% of genes. Is the 2x2 table constructed correctly?

ADD REPLY • link 7.9 years ago by apelin20 ▴ 480

5

Entering edit mode

You're calculating the probability of the following scenario:

You have a jar of 30203 black balls and 738 white balls
You draw 1090 of them randomly without replacement
You count the number of white balls you have drawn and it is equal to 452
The probability of drawing greater than 452 white balls given your conditions is virtually zero
Inversely, the probability of drawing fewer than 452 white balls given your conditions is virtually one

In a jar where ~ 2% of the balls are white, it would be extraordinarily rare to draw 50-70% of them being white by chance alone, which is why your p-value is so low.

ADD REPLY • link 7.9 years ago by Steven Lakin ★ 1.8k

1

Entering edit mode

The overlap is not that great, it is ~50-70% of genes

That's why I think p-values in genomics are often meaningless. You get very small p-values even if the effect size is small and this is a consequence of the large of data-sets available (thousands of genes, millions of SNPs etc.). By the way, I wouldn't say ~50-70% is a small overlap...

ADD REPLY • link 7.9 years ago by dariober 14k