Biostar Beta. Not for public use.
GSEA, input metrics
0
Entering edit mode
15 months ago
chipolino • 40

Hi everyone,

I want to run GSEA with my RNA-seq data using fgsea Bioconductor package (http://www.bioconductor.org/packages/release/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html). What metric for gene ranking should I use? Should it be absolute log fold changes or not? In the example, provided by fgsea, they use not absolute log fold changes, however, I am not sure, if it makes sense, since gene sets (pathways and GO terms) usually don't have direction.

Thank you

GSEA RNA-Seq • 818 views
ADD COMMENTlink
0
Entering edit mode

Copy /pasted :

The GSEA team has yet to determine whether any of these ranking statistics, originally selected for their effectiveness when used with expression data derived from DNA Microarray experiments, are appropriate for use with expression data derived from RNA-seq experiments.

ADD REPLYlink
2
Entering edit mode
18 months ago
shawn.w.foley • 670
USA

To perform GSEA analysis we typically use the log2-fold change NOT THE ABSOLUTE FOLD CHANGE with a pre-ranked GSEA. Pre-ranked GSEA will give you an output for enriched genes in the positive direction (upregulated) and enriched genes in the negative direction (downregulated). By using the absolute fold change you're losing that directional data, and will get erroneous results.

For example, if you have a gene set of interest and half of the genes are upregulated and half are downregulated there should be no enrichment for this gene set. However, if you rank based on the absolute fold change, a strong upregulation and enrichment will be reported.

ADD COMMENTlink
0
Entering edit mode

Copy /pasted from GSEA faq :

Should I use natural or log scale data for GSEA?

We recommend using natural scale data. We used it when we calibrated the GSEA method and it seems to work well in general cases.

and

For example, one might filter expression data to remove genes that have low variance across the dataset and/or log transform the data to make the distribution more symmetric. The GSEA algorithm does not benefit from such preprocessing of the data.

ADD REPLYlink
0
Entering edit mode

Personally I've preranked using FC*-log10(p) but there's no real established method and it seems unclear how reliable GSEA is for RNA-seq. This thread has some more suggestions: GSEA preranking metric for RNA Seq. The "similar posts" sidebar for this question might also be worth checking out.

ADD REPLYlink
0
Entering edit mode
14 months ago

GSEA uses a ranked list of genes (ranking by any suitable metric like fold-change). For RNA-seq data, you need to run GSEAPreranked instead of GSEA and it has been very well explained in the FAQ

https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3