Entering edit mode
7.7 years ago
William
★
5.3k
Say I have a gentoype matrix where I use the encoding
0 = HOM_REF
1 = HET
2 = HOM_ALT
NA = Missing genotype
For instance this dummy genotype matrix with 3 variants and 3 samples
Variant_1 0 1 1
Variant_2 1 1 NA
Variant_3 NA 0 1
etc
Do you need to first impute the genotype matrix to not have any missing genotypes(NA values)?
Or do you set the NA values to something like -9 or -999? This would influence the output of the linear / logistic regression heavily for variants with a lot of missing genotypes?