There is a well-known paper in the autoimmune disease literature that describes an algorithm called probabilistically identified causative SNPs, or PICS.
The paper identified ~9,000 PICS based on a posterior probability that the SNP is a true causal variant rather than a neutrally associated variant. I trimmed this to 1,865 LD independent PICS.
I wanted to see if the PICS were enriched for having p-values below a certain threshold in our lab's GWAS data. I could have set this up as an exact binomial test using the threshold as the probability of success, but this is problematic for a lot of reasons. For instance, although the genomic inflation in our study is well-controlled (lambda GC = 1.03), that still indicates a slight enrichment of lower p-values.
For this reason and many others, I think such an exact binomial test is a weak comparison and is likely to have a very high type I error rate. So, I searched for a more rigorous comparison.
Without going into substantial detail, after a lot of work I think a more appropriate test is to compare the association p-values of the PICS (from our study) to p-values of "SNPs like them" (e.g., SNPs that have similar numbers of nearby genes, or similar numbers of exonic variants, etc. etc.).
So, using SNPsnap I generated a list of 87,000 SNPs having characteristics matching those of the PICS. I call them "matched SNPs."
I then drew random samples of 1865 from from the ~87000 in the matched SNP list and compared the number of times the matched SNPs had more variants with an association p-value below the threshold than the PICS. I did this 100,000 times, and each time the matched SNP sample had more SNPs below the threshold than the PICS SNPs, I recorded this.
So, if 4650 of the 100,000 samples from the matched set had more SNPs below the p-value threshold than the PICS set, then I am saying p = 0.04650 and calling that an "empirical p-value."
I do not call this a permutation based test anywhere, because I have not scrambled any case-control statuses or anything like that. However I am thinking of calling it a "simulation test." Also, I would like to ask if it is valid to call this an "empirical p-value," and if not, I am unsure what to call it.
My question is, does either of these terms carry with it a denotation that would make it inaccurate to use that term in the above context?
Devon - thank you so much for taking the time to write a response to my post. I will read the linked paper and I appreciate that.
As for the comment "The appropriateness of the result is dependent on how "similar" the simulated data was, so I hope you put a good deal of though in that." Yes, I agree completely. Also, the reasons for that are articulated in the manuscript. Ultimately this whole analysis was an attempt to be more rigorous rather than less, so I hope I have succeeded in that to some degree.
This is perhaps a separate issue, but I am also weary of using the term Monte Carlo. Monte Carlo samples must be mutually independent, I think. But in my case I have 100,000 samples of 1865 variants from a list of 87000. As such, many of the samples contain overlapping variants and are therefore not strictly independent from one another. Practically, I am not very worried about this, as I expect each sample to have relatively little dependence on the next, but I am just trying to be cautious.
Thanks again.
@Devon Ryan - this paper is very helpful as well and also addresses sampling with and without replacement. It is also written specifically for genomicists / bioinformaticians. http://www.statsci.org/webguide/smyth/pubs/permp.pdf