You know for sure that you need to include all 10 principal components as covariates? One should only include the principal components that are actually segregating your groups of interests and that are therefore likely to affect the statistical inferences that you make from your data. This is 'adjusting' for population stratification.
With a .eigenvec
file, one can easily generate a principal components bi-plot for any pairwise combination of PCs:
R
setwd("/YourDir/")
options(scipen=100, digits=3)
#Read in the eigenvectors
eigen <- data.frame(read.table("plink.eigenvec", header=FALSE, skip=0, sep=" "))
rownames(eigen) <- eigen[,2]
eigen <- eigen[,3:ncol(eigen)]
summary(eigen)
#Determine the proportion of variance of each component
proportionvariances <- ((apply(eigen, 1, sd)^2) / (sum(apply(eigen, 1, sd)^2)))*100
plot(eigen[,1], eigen[,2])
legend("topleft", bty="n",
cex=1.5, title="",
c("Population 1","Population 2","Population 3","Population 4","Population 5"),
fill = c("yellow","forestgreen","grey","royalblue","black"))
Sample image
[need to manage colours and layout yourself]
[from: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)]
Kevin
hi there, shouldn't the command be "apply(eigen, 2, sd) ...." for the columns of the data table?
Oh, pray, tell... Should it be?