Hi,
I'm working with lung cancer data and I'm interested in lncRNAs, I would like to identify lncRNAs that target key pathways.
Recently, I read a paper which discusses about this type of analysis. Here is the paper Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context
In this Figure 4A is about lncRNAs that are predicted to target most pathways in MSigDB's Hallmark gene sets, which includes proliferation, immune response, signaling, and DNA damage pathways in multiple tumor types (PAN CANCER)
In the Methods section - Gene set enrichment
They mentioned like this.
When identifying lncRNAs whose targets are enriched in hallmark gene sets, we estimated gene set enrichment using Fisher’s Exact test
between predicted lncRNA targets of each lncRNA and expressed gene set members in each of 14 tumor types using adjusted pFET < 0.01; each test was adjusted for the total number of lncRNAs, lncRNA targets, and gene set tested.
My question:
Usually, co-expression network analysis gives us the lncRNAs which fall in the module of protein coding genes and with that we could do pathway analysis. This way we can find which lncRNAs regulate which pathways.
But with the information mentioned in the paper's methods section - how to estimate gene set enrichment using fishers exact test between protein coding genes and lncRNAs?
Can anyone clear my confusion in this? Could you also please tell how this can be done?
thanq
Without reading the paper but based on your citation, it seems clear (to me) that they take as the universe/background set all genes expressed in a given tumor and test whether genes identified as target of a given lncRNA are enriched in genes for a given pathway and consider significant only pathways with an adjusted p-value < 0.01. How the p-value is adjusted should be in the paper.
Here in the paper they say "each test was adjusted for the total number of lncRNAs, lncRNA targets, and gene set tested".
I'm a bit confused how the contingency table should look for the fishers test in this type of analysis?
If you don't mind Could you please tell with a small example mentioning the number. thanq
Assuming that in tumor A, 1000 genes are expressed and we're interested in lncRNA X for which we found 80 target genes and pathway P that comprises 13 genes, the contingency table would look like this:
In R, this would be tested like this:
Thanks a ton @Jean-Karim Heriche
Here 1000 genes expressed meaning, differentially expressed in tumor or just expressed? And 80 target genes of lncRNA X mean co-expressed genes or neighbouring genes?
If it is co-expressed genes what should be the cutoff for selecting target genes?
This would be genes expressed in the tumor since only these have a chance of being detected as for the target genes, you'll have to read the paper if you want to know how they defined lncRNA target genes.
Sure thanq. I will have a look.