Hi, I am fairly new to the field of bioinformatics and genetics! I've been analysing some ddRAD-Seq data using Stacks and am wanting to compute the linkage disequilibrium of my SNPs. I've used VCF Tools to do this. However, I've noticed that for different loci I have differing numbers of individuals included in the analysis.
This is because I have asked for a minimum of 80 % of individuals to be used in the Stacks output. But that means that up to 20% of individuals may not be represented at a locus. So, for example, individual A might have sequenced reads for locus (i) but not locus (ii) and so is not included in the linkage disequilibrium calculation for these two loci.
What I can't find out is if this matters?
If it does, do I need to select only loci represented in 100% of individuals, or is there a program which can accommodate the different numbers of individuals used for each locus?
Any help would be really appreciated! Thanks! Jenni