Entering edit mode
7.2 years ago
GabrielMontenegro
▴
670
I'm reading Design, Analysis and Interpretation of GWAS of Daniel O. Strom.
On chapter 2 I found:
If we have a sample of N unrelated individuals in a population the distribution of A allele counts for each individual follows a binomial with number of trials = 2N and frequency of A allele = p
p can be found as:
p = ( 1/2N ) * SUM (niA)
Where niA= number of A alleles in individual i
And the variance:
( 1/(2N)^2 ) * SUM Var (niA)
But, I do not understand why do we have the 2N squared in the second equation.
Thank you.
OK, it's a property, but why that particular number?
The estimate of the population allele frequncy is p^{hat} = sum_i{niA} / 2N where niA is the number of copies of the A allele for individual i and N is the number of individuals. You use 2N because you assume that the variant is diploid.
so Var(p^{hat}) = Var (sum_i{niA} / 2N ) = (1/2N)^2 Var(sum_i{niA}) [because of above mentioned property] = (1/2N)^2 sum_i Var(niA) [because each observation is independent]