Question

principal component plot interpretation

1

Entering edit mode

7.1 years ago

ED ▴ 10

Hello

I'm reading a paper but I have some difficulties with interpreting a principal component plot. I'm not familiar with principal component analysis so I looked up some information to understand better what it does. In the paper, they say "We performed a WGS-based genome-wide association study (GWAS) using a logistic model with principal component correction to account for any remaining population stratification after restriction to individuals with > 95% European ancestry, though inspection of the principal component plots demonstrates the cohorts are well balanced". So the two colors represent two different cohorts which are compared. I read in another paper that the principal component 1 axis reflects variation between two populations which have a different geographical location. But which variation does the principal component 2 axis reflect? And so because these red dots and blue dots are equally spread, they conclude that the cohorts are balanced? Because if the red dots were on one side of the principal component 1 axis and the blue dots on the other side than the differences in allele frequencies could be due to the difference in geographical location of this two cohorts? Am I interpreting this right or not?

enter image description here

GWAS principal component analysis • 2.0k views

ADD COMMENT • link updated 7.1 years ago by WouterDeCoster 47k • written 7.1 years ago by ED ▴ 10

score 0 · Answer 1 · 2017-02-27

The mathematics behind PCA are quite complex, but I find this an excellent explanation.

My rough interpretation is that the most variability in the dataset is projected on these two dimensions, so for genotypes, these are both mostly explained by geographical/ethnical differences. This can also be present in PC3, PC4,... etc. But just the two first components are visualised. And I think your conclusion is correct: populations are equally spread and mixed so no reason to assume population stratification. If the genotypes of the individuals were very different between blue and red cohorts you would expect that PCA separates the two cohorts you can't claim that the samples are from the same population.