Eigenvalues from PLINK1.90 pca don't sum to number of variables?
1
0
Entering edit mode
4.0 years ago
c.horscroft ▴ 10

Hi everyone!

As far as I understand it - when you do PCA the resulting eigenvalues should sum to the number of variables in the original dataset. I have a genetic dataset containing 3700 individuals and 111,867 variables.

When I do the PCA analysis in R, I can sum the eigenvalues and it equals 111,867, no problem.

However, when I do the PCA analysis in PLINK, my eigenvalues in the file plink.eigenval don't sum to anywhere near that number :/

I'm using PLINK/1.90beta, and using the --pca tag. I set it to return the maximum number of PCs (3,700) so I definitely have all the information.

Have I misunderstood how PCA is supposed to work? Or am I misunderstanding the output of PLINK?

Thanks in advance!!

PLINK PCA • 3.3k views
ADD COMMENT
1
Entering edit mode
3.9 years ago
c.horscroft ▴ 10

Answering my own question here as I've figured it out and someone might have the same question as me..

The sum of the eigenvalues is the same as the sum of the diagonal in the covariance matrix that is created when you use the --make-rel flag in plink. This sum of the diagonal is the overall variability.

To get the proportion of variance, divide the eigenvalues by this "sum of the diagonal". Alternatively, divide by the sum of the eigenvalues - BUT ONLY if you are sure you have returned all of the principal components (the default in PLINK is just to show the top 20).

A quick way to get just the diagonals from the plink.rel file is to use the awk code:

awk '{print $NR}' plink.rel > plink.rel.diag

There is more information here: Similar question on stackexchange

ADD COMMENT
0
Entering edit mode

Thank you, your answer help me a lot. This is what I did:

Following your answer, I run plink with:

plink --bfile ${set} --pca var-wts --make-rel --out ${set}.pca

Calculate sum of variance from relation covariance matrix:

sum1=$(awk '{sum+=$NR;}END{print sum}' ${set}.pca.rel)

Calculate percentage variance explained (pve) and write to file

while read line; do
    echo $(($line / $sum1 * 100))
done < ${set}.pca.eigenval > ${set}.pca.pve

Although I'm still vague about the exact formula, I seem to get the appropriate result for the percentage variance explained for each PC.

ADD REPLY

Login before adding your answer.

Traffic: 3001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6