Hardy-Weinberg assumptions in variant calling
2
0
Entering edit mode
8.7 years ago
nchuang ▴ 260

Quick question. I have a hard time grasping the application of HWE to variation calling.

I understand it simply as if it is in violation there is some sort of evolution occurring (sorry if this is completely wrong). However, it seems like HWE is typically used to filter inaccurate calls? But what if the population you are examining (eg Ashkenazi Jewish people) are known to have low genetic diversity due to inbreeding in their group (and perhaps other factors) then doesn't that count as violations to HWE, and you would not want to filter your calls by HWE?

Or say you were examining sherpas in the Himalayas, and you hypothesized they have better adaptations to the extreme elevation, would that be considered an evolutionary change due to selection that if you were to sequence their genomes you would not filter by HWE?

I guess I am asking if you were to filter by HWE in these cases you might be throwing out the rare alleles that may be of interest?

Thanks

genome SNP • 4.7k views
ADD COMMENT
5
Entering edit mode
8.7 years ago
lh3 33k

For SNP calling, we are not filtering out all HWE outliers. We are filtering the HWE outliers with negative inbreeding coefficient. In common words, we are filtering out sites with excessive heterozygotes that are typically caused by CNVs or bad reference. We are not filtering those with excessive homozygotes that can be caused by population structures. In addition, rare SNPs usually don't lead to extreme HWE violation. The HWE filter actually has little power for rare events.

Also bear in mind that the SNP calling models of quite a few SNP callers, including GATK and samtools, assume HWE. This is a model assumption you can't lift. A paper is arguing that this assumption hurts the call quality for SNPs with extreme HWE. However, I tend to believe this is only a mild concern in practice. That paper is not doing a fair comparison IMHO.

ADD COMMENT
0
Entering edit mode

isn't inbreeding coefficient a separate metric? how is it related to HWE? sorry I need to read about that as well

ADD REPLY
0
Entering edit mode

Read wiki.

ADD REPLY
0
Entering edit mode

This is a very cogent comment that addresses many issues not found in my post. Thanks very much for adding it.

ADD REPLY
4
Entering edit mode
8.7 years ago
LauferVA 4.2k

Your instincts are correct, but there is a little bit more to the story.

Variant calls or genotyping chip-"called" SNPs are checked for HWE because variants found WAY outside of HWE are commonly that way due to technical artifacts.

Consider a SNP that is very close, say 4bp away from another SNP. Imagine that the SNP you have is found way out of HWE, and in fact contains an excess of heterozygous calls. Imagine that you examined MAF, and also discovered that the minor allele, which normally has an allele frequency of 0.1, in fact has a MAF of 0.13 in the same ethnicity, and nearly all the difference is in HET calls.

In this case, it is possible that the chip is actually picking up signal from the SNP that is 4bp away, and calling the SNP you are interested as the minor allele.

Now, above I said your instincts are correct. In fact, disease-relevant SNPs are out of HWE due to selection or other factors at greater than chance rates. The problem is, they are out of HWE for technical reasons even more commonly!

As a result, the conservative approach for SNPs out of HWE, especially for SNPs WAY out of HWE, is throw them out. If the SNP is in fact representative of a true finding, then other SNPs in the area should be associated with the condition as well.

Disclaimer: what I mentioned above is one very specific example that happened to me. There are MANY other examples that could be mentioned. The general take home is, SNPs out of HWE are suspicious for being technical issues, but in doing so we realize we also may be throwing out exactly the type of SNPs we want to find.

If you do in fact have a SNP that is of intense interest to you, it is probably wise to validate it in other ways, probably first bioinformatically, and second with a new assay, before spending money on functional follow up. For instance, as a quick check, does MAF seem roughly in line with what is expected for that ethnicity? Does the locus have support from LD?

ADD COMMENT
0
Entering edit mode

ah thanks! well I am using WGS data so there is no precheck for HWE, but I think I sort of understand what you mean. Is there literature out there that try to quantify the level of FP correlated with HWE? I know some people take p of 0.05 for rejecting the null while others use 0.001. I suppose I would want to take 0.001 as my threshold assuming my rare allele could be out of HWE above that level.

ADD REPLY
2
Entering edit mode

http://vcftools.sourceforge.net/man_latest.html VCF tools will calculate HWE for you. People use different thresholds.

I've seen 10-5 https://www.well.ox.ac.uk/dtc/GWAS_1.docx

and 1x10-7.

0.05 would be extremely conservative. You would be throwing out 1/20 of your data due to chance.

ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6