I had a colleague approach me recently asking me how to prioritize candidate causal variation and isolate independent association signals in a GWAS context. I mentioned conditioning on SNPs, how algorithms like PICS and PAINTOR work, and a few other ideas.
However I came to understand fairly quickly that these suggestions might not help very much.
He had data from catfish, most of which were siblings. There were only 20 families with ~1000 individuals total. He had used QFAM and linear mixed modelling to generate association p-values for each SNP, and it appeared QFAM was much better powered. The really striking thing was the profound LD that resulted from having so many fish from just a few parents.
In fact, looking at the manhattan plot, linkage over many megabases was quite common. As a result, the association signals that he had were even more difficult to resolve than the human HLA...
However, he had isolated genes with similar gene IDs in several of just a few the associated loci that seem very plausible for his problem of interest.
So, complaints about the study design aside, I became very interested in how one could actually help in this case. Compounding the problem of extremely large LD blocks is the issue of strong dependence between samples. Does anyone have suggestions or required reading to help solve a problem like this one?