GSEA, input metrics
2
1
Entering edit mode
5.7 years ago
chipolino ▴ 150

Hi everyone,

I want to run GSEA with my RNA-seq data using fgsea Bioconductor package (http://www.bioconductor.org/packages/release/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html). What metric for gene ranking should I use? Should it be absolute log fold changes or not? In the example, provided by fgsea, they use not absolute log fold changes, however, I am not sure, if it makes sense, since gene sets (pathways and GO terms) usually don't have direction.

Thank you

GSEA RNA-Seq • 11k views
ADD COMMENT
0
Entering edit mode

Copy /pasted :

The GSEA team has yet to determine whether any of these ranking statistics, originally selected for their effectiveness when used with expression data derived from DNA Microarray experiments, are appropriate for use with expression data derived from RNA-seq experiments.

ADD REPLY
2
Entering edit mode
5.7 years ago
shawn.w.foley ★ 1.3k

To perform GSEA analysis we typically use the log2-fold change NOT THE ABSOLUTE FOLD CHANGE with a pre-ranked GSEA. Pre-ranked GSEA will give you an output for enriched genes in the positive direction (upregulated) and enriched genes in the negative direction (downregulated). By using the absolute fold change you're losing that directional data, and will get erroneous results.

For example, if you have a gene set of interest and half of the genes are upregulated and half are downregulated there should be no enrichment for this gene set. However, if you rank based on the absolute fold change, a strong upregulation and enrichment will be reported.

ADD COMMENT
1
Entering edit mode

Copy /pasted from GSEA faq :

Should I use natural or log scale data for GSEA?

We recommend using natural scale data. We used it when we calibrated the GSEA method and it seems to work well in general cases.

and

For example, one might filter expression data to remove genes that have low variance across the dataset and/or log transform the data to make the distribution more symmetric. The GSEA algorithm does not benefit from such preprocessing of the data.

ADD REPLY
1
Entering edit mode

Personally I've preranked using FC*-log10(p) but there's no real established method and it seems unclear how reliable GSEA is for RNA-seq. This thread has some more suggestions: GSEA preranking metric for RNA Seq. The "similar posts" sidebar for this question might also be worth checking out.

ADD REPLY
0
Entering edit mode
5.7 years ago

GSEA uses a ranked list of genes (ranking by any suitable metric like fold-change). For RNA-seq data, you need to run GSEAPreranked instead of GSEA and it has been very well explained in the FAQ

https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F

ADD COMMENT

Login before adding your answer.

Traffic: 1500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6