Hello,
I'm trying to perform an analysis similar to those in ENCODE or FANTOM publications where enrichment of GWAS SNPs in regulatory regions (e.g.: DHSs, CAGE-defined enhancers) is calculated (1,2).
So, I would like to calculate if a set of GWAS SNPs associated with a disease of interest is enriched in my set or regulatory regions compared to a background distribution of SNPs (i.e.: the 1000 Genomes data).
How am I supposed to set up my contingency table for Fisher's exact test?
My guess would be something like:
And then simply use the R fisher.test() function on the matrix.
There's also the fact that the GWAS SNPs are a subset of the 1000 Genomes SNPs: should I subtract them from the superset before performing the test?
Thanks!
(1) Maurano et al., 2012: https://www.ncbi.nlm.nih.gov/pubmed/22955828
(2) Andersson et al., 2014: https://www.ncbi.nlm.nih.gov/pubmed/24670763