Input GSEA Pre-ranked list
2
5
Entering edit mode
8.8 years ago
stevenlang123 ▴ 210

Hey y'all

I'm currently trying to run GSEA using a pre-ranked gene list but I'm not sure if my input file is correctly formatted, because my results seem to be mostly insignificant.

So my input looks something like this (roughly 16,000 genes):

Where my ranking statistic is the negative log of the p-Value obtained through an association test.

GENE      neg_log_Chi_permutation
ARHGAP4   0.928986
C16orf3   1.496821
HOPX      0.975562
FAM3D     1.132781
HTR2C     1.276158
UGCG      0.064802
VPS13D    0.123508
VWF       0

My results have over 300 gene sets shown to be enriched, but many of them have a FDR p value of close to 1, with a high NES value. What could I be doing wrong?

Best,
Steven

gene next-gen RNA-Seq genome software • 29k views
ADD COMMENT
2
Entering edit mode

In the original paper describe GSEA, you use FDR less than 0.25 actually. Additionally, pay attention to p=0 when you -log(p).

ADD REPLY
0
Entering edit mode

I do have several instances of p=1, and subsequently instances where genes are ranked the same. How should that be reconciled?

ADD REPLY
5
Entering edit mode

Just recognised an error in your methods. You should combine the sign of logFC and -log of the p-Value (you ranked both up and down DE genes in the top of the rnk file). Or rank based on other metric, like logFC, t statistic. Additionally, GSEA use all the genes not DE genes.

ADD REPLY
0
Entering edit mode

Hi Zhilong,

Regarding "pay attention to p=0 when you -log(p)", should those genes with p=1 be filtered out prior to the GSEA analysis?

ADD REPLY
0
Entering edit mode

I suggest that all the genes /probes should be included for GSEA input.

ADD REPLY
7
Entering edit mode
8.8 years ago

You should all the genes from your dataset and rank them. Here is a nice post on ranking the DE genes for GSEA analysis.

http://genomespot.blogspot.com.es/2014/09/data-analysis-step-8-pathway-analysis.html

ADD COMMENT
3
Entering edit mode
8.8 years ago
andrew ▴ 560

I'm not an expert on GSEA, but there are at least two possibilities that I can see worth considering.

  1. Your data is not significant. I know it's hard to believe, but it happens.
  2. It would appear that if your input list contains roughly 16k genes, you are covering almost all of the protein coding genes, and thus, there is very little room for "enrichment". However, the "rank" of your gene list is supposed to affect this. But this is one of the classic limitations with any enrichment analysis.

Others may have alternative points to consider.

ADD COMMENT
0
Entering edit mode

Thanks for your feedback. In the case of the later, can the pre-ranked gene list be truncated? Or does posterior modification bias the experiment?

ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6