QC on genotype data with no controls (in particular Hardy Weinberg Equilibrium)
4.5 years ago

Hi All,

I have ~550k SNPs for 570 individuals who are all patients of a particular disease. I also have a score indicating the severity of a key disease phenotype for each individual. As such I do not have any controls, we are instead interested in modelling the severity of the disease based on the genetic profile of the patients.

I am performing QC on this data, and have a few questions. In particular, how should I choose an appropriate threshold for HWE for this data? According to some sources (e.g. Wang et al), it is recommended to avoid removing SNPs where the cases' genotypes deviate from HWE. Considering I have no controls, does this mean that I should not remove any SNPs based on HWE from my data?

On a more general note, as this study differs from a case-control GWAS, is there anything I should be doing differently for the quality control phase? Should I follow GWAS protocols as closely as possible?

Many thanks in advance for any help or advice.


