Question

Estimating gene set enrichment using Fishers exact test

0

Entering edit mode

5.2 years ago

Biologist ▴ 290

Hi,

I'm working with lung cancer data and I'm interested in lncRNAs, I would like to identify lncRNAs that target key pathways.

Recently, I read a paper which discusses about this type of analysis. Here is the paper Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

In this Figure 4A is about lncRNAs that are predicted to target most pathways in MSigDB's Hallmark gene sets, which includes proliferation, immune response, signaling, and DNA damage pathways in multiple tumor types (PAN CANCER)

In the Methods section - Gene set enrichment

They mentioned like this.

When identifying lncRNAs whose targets are enriched in hallmark gene sets, we estimated gene set enrichment using Fisher’s Exact test between predicted lncRNA targets of each lncRNA and expressed gene set members in each of 14 tumor types using adjusted pFET < 0.01; each test was adjusted for the total number of lncRNAs, lncRNA targets, and gene set tested.

My question:

Usually, co-expression network analysis gives us the lncRNAs which fall in the module of protein coding genes and with that we could do pathway analysis. This way we can find which lncRNAs regulate which pathways.

But with the information mentioned in the paper's methods section - how to estimate gene set enrichment using fishers exact test between protein coding genes and lncRNAs?

Can anyone clear my confusion in this? Could you also please tell how this can be done?

thanq

RNA-Seq lncrna gsea fisherstest • 4.5k views

ADD COMMENT • link 5.2 years ago by Biologist ▴ 290

1

Entering edit mode

Without reading the paper but based on your citation, it seems clear (to me) that they take as the universe/background set all genes expressed in a given tumor and test whether genes identified as target of a given lncRNA are enriched in genes for a given pathway and consider significant only pathways with an adjusted p-value < 0.01. How the p-value is adjusted should be in the paper.

ADD REPLY • link 5.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Here in the paper they say "each test was adjusted for the total number of lncRNAs, lncRNA targets, and gene set tested".

I'm a bit confused how the contingency table should look for the fishers test in this type of analysis?

If you don't mind Could you please tell with a small example mentioning the number. thanq

ADD REPLY • link 5.2 years ago by Biologist ▴ 290

3

Entering edit mode

Assuming that in tumor A, 1000 genes are expressed and we're interested in lncRNA X for which we found 80 target genes and pathway P that comprises 13 genes, the contingency table would look like this:

            |  A  |  Not A |
---------------------------|
Targets of X|  10 |   70   |
---------------------------|
Other genes |   3 |   917  |

In R, this would be tested like this:

contingency.table <- matrix(c(10, 3, 70, 917), nrow = 2)
fisher.test(contingency.table)

ADD REPLY • link 5.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a ton @Jean-Karim Heriche

Here 1000 genes expressed meaning, differentially expressed in tumor or just expressed? And 80 target genes of lncRNA X mean co-expressed genes or neighbouring genes?

If it is co-expressed genes what should be the cutoff for selecting target genes?

ADD REPLY • link 5.2 years ago by Biologist ▴ 290

2

Entering edit mode

This would be genes expressed in the tumor since only these have a chance of being detected as for the target genes, you'll have to read the paper if you want to know how they defined lncRNA target genes.