PLINK Principal Components not adequately controlling for population stratification in linear regression?
1
1
Entering edit mode
8.1 years ago
dam4l ▴ 200

I'm doing a GWAS using ~15 million variants and ~800 people. I am unfamiliar with Linux, so I have tried using PLINK MDS and PCA functions to obtain principal components to be used as covariates in the association analysis to control for population stratification. When I plotted the p-values (QQ plot) obtained from the association analysis, the distribution was pretty messy, suggesting that I did not adequately control for population stratification. I took the following steps:

  1. Pruned based on LD using PLINK --indep
  2. Created a genome file:

    ./plink --bfile file --genome --extract plink.prune.in

  3. Used --pca to generate an eigenvec file containing PCs

    ./plink --bfile gendep_merged --cluster --pca header --extract plink.prune.in --read-genome plink.genome

  4. Performed the association analysis using 10 PCs from the eigenvec file as covariates:

    ./plink --bfile file --pheno phenotype.txt --allow-no-sex --covar plink.eigenvec --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 --out association --linear --adjust

Am I missing a step or should any of the flags used by modified in order to produce PCs that will adequately control for population stratification in this sample?

Any input would be greatly appreciated.

plink SNP gwas pca population stratification • 6.9k views
ADD COMMENT
1
Entering edit mode
8.1 years ago

How exactly is using the first ten principle components controlling for "population stratification"? If I understand correctly, you're performing an association test, and telling the model fit to smooth out the ten biggest drivers of variance in your dataset? When you checked the principle components, did they indicate that the first ten explained the difference in population? Could you be smoothing out the effect you're testing for instead?

ADD COMMENT
1
Entering edit mode

Using 10 does indeed seem a bit excessive. You should only use the PCs that actually stratify your population. If that's none of them, then do not include any.

ADD REPLY

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6