Cool - thanks for coming back to post this.

Entering edit mode

I am running GWAS analyses with gemma that seem to work but when I open the centered relatedness matrix (.cXX), I do get a symmetric matrix but the main diagonal is not made of 1s ... why?

The manual p.11 part 3.3.1 shows an example of matrix with a main diagonal which is not made of 1s. I also tried to dig into the gemma code on Github but I did not find the piece of code used to calculate the relatedness matrix.

Plink and gemma commands to generate the matrix:

```
# Generate plink file from my VCF file (67 samples)
vcftools --gzvcf $vcf_file --plink --out $prefix_vcf
# Make binary plink files
plink --file $prefix_vcf --make-bed --out $prefix_vcf
# Generate centered relatedness matrix with gemma
gemma -bfile $prefix_vcf -gk 1 -o $prefix_vcf
```

Here the output log of gemma:

```
##
## GEMMA Version = 0.94
##
## Command Line Input = -bfile subset_67_accessions_wo_singletons_only_alt_allele -gk 1 -o subset_67_accessions_wo_singletons_only_alt_allele
##
## Summary Statistics:
## number of total individuals = 67
## number of analyzed individuals = 67
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 927314
## number of analyzed SNPs = 39678
##
## Computation Time:
## total computation time = 0.123833 min
## computation time break down:
## time on calculating relatedness matrix = 0.00733333 min
##
```

I get as expected a square matrix of order 67 (as my number of samples in the VCF file). Here are the first 5 rows and columns of the matrix:

```
0.5265889259 0.0918289748 0.01315201643 -0.02417258371 -0.00300349980
0.0918289748 0.4426259465 0.01751500729 -0.0196104412 -0.04093555612
0.01315201643 0.01751500729 0.494663962 0.01773204554 0.03579284737
-0.02417258371 -0.0196104412 0.01773204554 0.5622735808 0.05660374681
-0.003003499807 -0.04093555612 0.03579284737 0.05660374681 0.6064086438
```

There is symmetry but not a main diagonal of 1s while same individuals should be identical to themselves, shouldn't they?

*NB*: I posted a similar post on the gemma-discussion google group but still did not get an answer after a month.

Entering edit mode

I've got the answer from the creator of GEMMA Xiang Zhou:

The centered relatedness matrix does not scale each column of the genotype matrix, so the main diagonal of the resulting relatedness matrix will not be close to 1s (usually around 0.3 depending on maf etc.). If you use the scaled relatedness matrix (.sXX), then the main diagonal should be made of values that are very close to 1s.

Entering edit mode

The diagonals here remind me more of a kinship matrix (self-kinship = 0.5) than a relatedness matrix (self-relatedness = 1). I haven't used Gemma before, but other relatedness programmes (like the R-programme KGD) based on high-throughput sequencing data have given me self-relatedness values that are greater or less than 1, for the reasons that Kevin and Johan point out. You could always scale your entire matrix so that self-relatedness is 1 (which will scale all your other relatedness values).

Entering edit mode

0

I thought the terms kinship matrix and relatedness matrix could be used interchangeably. Gemma documentation should be updated to use the accurate term. I understood that a value >0.5 (self-kinship) or >1 (self-relatedness) could be explained by consanguinity but I still don't understand why I have values <0.5 in the main diagonal of the matrix.

Loading Similar Posts

I have not used GEMMA; however, I don't recall other relatedness metrics (that I have used) equaling 1 for when the same person is being compared to themselves. I believe it's technically a limitless scale that can be both positive and negative, and which may be based on positions that are both genotyped in one individual in your dataset and those that are only genotyped in others. Probably better to hear from the developers of the program, as they may be the only ones who know how it was coded.

Thanks for your comment. Looking around, I could find that usually the main diagonal is made of 1 or values above 1 if consanguinity. The figure 1 of the Bae et al 2014 is illustrating well what I expect from a kinship matrix. Since individuals compared to themselves are based on the same SNP call datasets, it seems unlikely to me that the variations from 1 are due to missing data.