I have done a RNA-seq analysis using Galaxy and my next step would be GSEA. Because it's a RNA-seq I need to pre-rank the gene list. I know that there are (at least?!) two valid methods:
1) use signed p-value
2) lower 90% confidence interval of the fold change
The 1st is using sign of the fold change (FC) (-1, 0 or +1) coupled with logP= -log10(PValue), making the new metric=logP/fcSign
I got a little issue with this. Because Galaxy is rounding numbers I got hundreds genes with the same p_value (for example p_value=0.00005), so with the same metric. That's not very useful for a ranking. How can I fix it?
About the 2nd method it's not clear to me and I didn't find more information. Could anyone give me more info about?!
In the main time, I was thinking to use a ranked list like this: Because it's a RNA-seq I must use "classic" for your enrichment score (thus, not weighting each gene's contribution to the enrichment score by the value of its ranking metric). So what is important is only the rank order.
So I was thinking to put on top of the list genes significant expressed FC ranked, FC as metric. For the not significant genes, they got a q-value range between 0.05 and 1, and I was thinking to use metric=minimumFC*( qLOG / (-log10(0.05) ) )
In this manner, closer to 0.05 is the q-value, closer to 1 will be ( qLOG / (-log10(0.05) ) ) , closer the not significant gene will be to the significant ones. When q-value is 1, ( qLOG / (-log10(0.05) ) ) is 0 and metric is 0. Of course it's specular with gene -FC, with a -minimumFC in the formula. I thought it could works because it is a not weighted ranking.
What do you think about? Can I use it? Thanks