Question

Possible methodology for putative combination of different statistics and metrics for ranking of functional enrichment analysis

0

Entering edit mode

6.2 years ago

svlachavas ▴ 790

Dear Biostars,

i would like to ask a more general exploratory question concering some general functional enrichment analysis results, regardless of technology (rna-seq, microarrays, etc), or methodology (GSEA, etc). For instance, assume that i have some results from an overepresentation analysis, with some terms, along with fisher's exact test p-values, as also the relevant jaccard coefficient for each result. I wonder (and please excuse me for any "naive" question as a molecular biologist), is there a mathematical or statistical way of combining these two measures, into an aggregated score ? that is, the p-values with the jaccard score? To somehow take both of them into account ?

Thank you for consideration on this matter !!

fishers exact test functional enrichment analysis • 1.6k views

ADD COMMENT • link 6.2 years ago by svlachavas ▴ 790

0

Entering edit mode

I am not aware of methods that would allow to combine p-values with a similarity measure (which the Jaccard coefficient is). I suspect another case of the XY problem. What are you trying to achieve ?

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean-Karim, thank you for your comment !! Despite the fact that my question seems "indeed" naive and possibly a part of the XY problem you mentioned, actually i considered this possibility, based on a methodology that is implemented in the enrichr tool: amp.pharm.mssm.edu/Enrichr and the relative paper (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-128):

"We first compute enrichment using the Fisher exact test for many random input gene lists in order to compute a mean rank and standard deviation from the expected rank for each term in each gene-set library. Then, using a lookup table of expected ranks with their variances, we compute a z-score for deviation from this expected rank, this can be a new corrected score for ranking terms. Alternatively, we combined the p-value computed using the Fisher exact test with the z-score of the deviation from the expected rank by multiplying these two numbers as follows: c=log(p)⋅z

Where c is the combined score, p is the p-value computed using the Fisher exact test, and z is the z-score computed by assessing the deviation from the expected rank..."

That's why i have created my initial question, with the notion of taking somehow both different methods/metrics into consideration for the final ranking of the results!!

ADD REPLY • link 6.2 years ago by svlachavas ▴ 790

1

Entering edit mode

You can always take two numbers and multiply them but it doesn't mean this is a principled approach. I just don't know of a principled way of doing this. Note that this doesn't mean there is none. My concern is about interpretability and "correctness", i.e. is the combined score interpretable as you would want it in all cases ? Or put another way, what is this new score measuring ? For example if your new score is the product of a p-value and a similarity measure, in the general case, it isn't a similarity measure anymore and is not a probability either.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean-Karim,

thank you for your opinion on this matter, and your crusial comments-then, please excuse me if i paraphraze my initial question, that perphaps better suit my purpose, about the possible agrregation of scores. In detail,

as, an extra option from my initial description, of a methodology we have developed in my lab: the input of the user is a DEG list, separated in up and down genes, then, for each "experiment" from a drug repository database, a separate Fisher's exact test is performed: for the up genes versus the up genes from the drug base, as also one for the down input genes and the relative down from the base, for each experiment listed in the database.

Overall, from each experiment/gene-set tested, i have 2 Fisher's exact test p-values (with function: fisher.test(....alternative="greater")

Thus, in your opinion:

Is in this case scenario feasible to combine these 2 p-values, into a final aggregated one? and if so, which of the methodologies should be more appropriate ? for instance fisher's approach, their product, etc ?

I also have to mention, that in the drug base repository, for each experiment there are 2 definately "separate" gene-sets, up & down, which are resulted from the same samples, etc. Only the gene list from the user is changing.

Best,

Efstathios-Iason

ADD REPLY • link 6.2 years ago by svlachavas ▴ 790

1

Entering edit mode

If by "Fisher's approach", you mean Fisher's method, then yes, you can use it to combine p-values. If I understand correctly what you're doing, you want to measure how much up and down-regulated genes given as two lists by a user overlap with the up and down-regulated genes derived from various drug treatments. In my opinion, the separate results are potentially of interest as well as comparing user's up with drug's down and vice-versa. I can imagine someone coming up with an antagonist of a given drug and being interested in whether the gene signature is also the opposite, i.e. does the antagonist up-regulate the genes that the drug down-regulates ? Unless there's something I am missing, I think there's not much to be gained here by combining the results.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean-Karim,

thank you for your quick and updated answer. And yes, your understanding is correct, as also this is our approach and goal. I mentioned the possibility of combining the two resulted p-values-with the Fishers method or another methodology for combining p-values-in order, to have in the end, a unified and combinatorial p-value representing the total experiment !! That's why i updated my question on this matter !! You are correct that illustrating both results separately is very crusial, but i would like a unified measure of each experiment, in order to able to rank by this combinatorial p-value the top experiments.

ADD REPLY • link 6.2 years ago by svlachavas ▴ 790

1

Entering edit mode

I also realized that p-values may not be the best measures as they don't take into account the effect size (i.e. very different situations can give the same p-value). So you may want to also give the Jaccard coefficients (or its average if you want only one value). The problem for ranking is to decide what should be ranked first: Should a small absolute overlap in only up-regulated genes with a small p-value be ranked higher than a large overlap in both categories with a slightly higher p-value ? This is where I now see why you wanted to combine p-values and Jaccard coefficients. I guess that for ranking purposes, you could use something like -log(p)+Jaccard (or the product) but you'd have to experiment to find a scoring system that consistently returns sensible rankings (because Jaccard is bounded whereas the log is not). Another approach is to rank each experiment separately and combine the ranks in a way that works for you.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean-Karim,

exactly, that is my initial exploratory question of combining different metrics. Of course, as you have already mentioned, p-values are sensitive measures, especially for testing in over-representation purposes. And perhaps many arbitary or potential options about ranking.

Another idea, based on your proposal, as an "arbitary" combinatorial score (perhaps it is the same you mentioned above), for an experiment if you have the fisher's "up" pvalue, fisher's "down"pvalue, and the average Jaccard coef:

combo.pval = up.val X down.pval- ((up.pval Xdown.pval)X (log( up.valXdown.pval))

combo.score= -log(combo.pval) + average.Jaccard.coef

But again, is the issue of Jaccard with bounded values[0,1]...

ADD REPLY • link 6.2 years ago by svlachavas ▴ 790

0

Entering edit mode

You'll have to experiment to see what gives you the best/most sensible results. Coming back to my initial comment, I don't know of a principled way of doing this, the choice of scoring system is going to be arbitrary.

ADD REPLY • link 6.2 years ago by Jean-Karim Heriche 27k