gene set overlaps with ranks or weights
1
0
Entering edit mode
7.0 years ago
igor 13k

It's fairly trivial to check the significance of overlap between 2 gene lists with a Fisher's exact test in R (fisher.test()), which is widely accepted. Is there a good alternative that would also be able to incorporate ranks or weights for each gene?

stats R • 1.8k views
ADD COMMENT
0
Entering edit mode

You want a number which measures overlap between two list but accepting weight for each gene? for each gene list? Why do you want it? Maybe there is another problem you want to solve the hard way...

ADD REPLY
0
Entering edit mode

To my knowledge you cannot unless you are doing something on the lines of over-representation of gene set which ideally needs a weight and that is what is used n gene set enrichment or GO. While comparing 2 universe of genes does not really stands out with any other attributes associated to it. So its a simple hypergeomtric test. However if you are adding attirbutes of length or some bias in the gene then you will have to also perform some other test which will reject your null hypothesis. So to my knowledge it will not be ideal unless you are doing some enrichment based gene set or over-representation analysis.

ADD REPLY
0
Entering edit mode

Sure. Maybe what I am asking for is over-representation. Is there a simple way to do that?

ADD REPLY
0
Entering edit mode

If your gene lists depend on an arbitrary cutoff, then weights would be a good compromise. For example, you can take top 100 genes or top 500 genes. If you take top 500, the top 100 still have more confidence, so they should count for more.

For example, GSEA can do analysis for a full ranked gene list, but that's a full standalone package. I just want a more simple function that I could integrate into my workflow.

ADD REPLY
1
Entering edit mode

GSEA can be performed in a single function, see function fgsea of fgsea package in Bioconductor, or to consider the values of the list you can have a look at roast function of limma package. See this recently post about different GSEA

If you take top 500, the top 100 still have more confidence

If you select less genes but the same pathway is enriched, it will have a lower p-value. But the confidence is the same. Precisely in over-representation test one important aspect is to consider what is the background population you are considering.

ADD REPLY
0
Entering edit mode
7.0 years ago
Benn 8.3k

Yes, you can use weight algorithm from the topGO package (or elim strategy). If you have RNAseq data, the length bias (or any other bias) of the genes (transcripts) can be included with goseq package.

ADD COMMENT
0
Entering edit mode

The weight algorithm in topGO doesn't allow to set the weight for each gene before calculating the overlap. Also it is just for gene ontologies, so it may not do what Igor is looking for, if he/she wants to test for a different list of genes.

ADD REPLY
1
Entering edit mode

Sorry you're right. The term 'gene set' is often synonymous with GO term, hence the confusion from my side.

ADD REPLY

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6