PCA: summarize results from principal components analysis and look for population heterogeneity
1
2
Entering edit mode
5.8 years ago
F_cm_C ▴ 30

Hi! I am conducting a GWAS. I used plink 1.9 (--pca) to obtain first 10 population principal components which I entered in the regression model as covariates to correct for population stratification. I would like to the use the .eigenvec file, which is the output from plink --pca containing population principal components, to summarize the population structure and look for potential heterogeneity. I would be super grateful if anyone could help or refer me to a practical tutorial. Thanks a lot!

PCA principal components analysis plink • 4.8k views
ADD COMMENT
0
Entering edit mode

hi there, shouldn't the command be "apply(eigen, 2, sd) ...." for the columns of the data table?

ADD REPLY
0
Entering edit mode

Oh, pray, tell... Should it be?

ADD REPLY
8
Entering edit mode
5.8 years ago

You know for sure that you need to include all 10 principal components as covariates? One should only include the principal components that are actually segregating your groups of interests and that are therefore likely to affect the statistical inferences that you make from your data. This is 'adjusting' for population stratification.

With a .eigenvec file, one can easily generate a principal components bi-plot for any pairwise combination of PCs:

R

setwd("/YourDir/")
options(scipen=100, digits=3)

#Read in the eigenvectors
eigen <- data.frame(read.table("plink.eigenvec", header=FALSE, skip=0, sep=" "))
rownames(eigen) <- eigen[,2]
eigen <- eigen[,3:ncol(eigen)]

summary(eigen)

#Determine the proportion of variance of each component
proportionvariances <- ((apply(eigen, 1, sd)^2) / (sum(apply(eigen, 1, sd)^2)))*100

plot(eigen[,1], eigen[,2])

legend("topleft", bty="n",
  cex=1.5, title="",
  c("Population 1","Population 2","Population 3","Population 4","Population 5"),
  fill = c("yellow","forestgreen","grey","royalblue","black"))

Sample image

[need to manage colours and layout yourself] biplot-new

[from: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)]

Kevin

ADD COMMENT
1
Entering edit mode

Really insightful, Thx Kevin.

ADD REPLY

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6