Question

Why Is Gwas Underpowered To Detect Association In Rare Variants?

12

Entering edit mode

11.9 years ago

michealsmith ▴ 790

This may sound stupid, but I do have questions.

two explanations:

1. GWAS is based on the linkage disequilibrium between common SNP tag and potential causal alleles. Rare variants are not usually in LD with the tag.  Is this correct?

Is it possible that rare variants can also be LD with those low-frequency SNP tag? Maybe SNP array only include common tag?

2. Since it's in low-frequency, of course compared with those common ones, it may need large effect size, or large sample size to reflect the effect of the rare variants. Is this correct?

I think it should be the discrepancy of allele counts between case and control that determines the odd ratio. Maybe for some rare variants, it's totally absent in control, though it's rare in cases, which still produce a large odd ratio.

Anyway, can GWAS detect rare variants?

Thanks

gwas variant • 9.3k views

ADD COMMENT • link updated 11.9 years ago by Alex Paciorkowski 3.5k • written 11.9 years ago by michealsmith ▴ 790

0

Entering edit mode

I understand what you mean, but you should change your question to be more correct, because GWAS does not "detect variants", it detects association between variant alleles and a phenotype. "Is GWAS underpowered to detect association in rare variants, and if yes, why?"

ADD REPLY • link 11.9 years ago by Michael 54k

0

Entering edit mode

The case you mentioned, i.e. the minor allele at a variant is totally absent in controls, does lead to a large estimated OR (infinity actually). However, it won't be significant unless the sample size is big enough (especially you have to collect large enough number of cases so that minor alleles are enriched). [remember that variance of log(OR) is asymptotically equal to (1/a + 1/b + 1/c + 1/d) where a, b, c, d are cell counts in a 2x2 allele/disease table.]

ADD REPLY • link 11.5 years ago by nnlnn ▴ 60

score 8 · Answer 1 · 2012-06-08

8

Entering edit mode

11.9 years ago

Michael 54k

You are definitely having a point. I comment on your suggestions, but I am a bit of an autodidact learner in this field, so take care. May the geneticists correct me.

regarding 1) I would put it like this: The design of common GWAS platforms, which means genotyping chips, is based on linkage-disequilibrium. Genotypes can be imputed for markers not directly genotyped using a reference panel (e.g. 1000 genomes, hapmap). It depends on whether the marker has been seen in the reference panel. If no minor allele was observed in the reference panel for this rare variant, no genotype can be imputed. I don't know if it is generally true that rare-variants are not in LD with other markers, but there might be less data to support this.

If you have (exome) sequencing data, one could run association tests on any variant calls though, so it depends on the genotyping technology and if your sample size is sufficiently large.

regarding 2) Yes, I think that is correct. Because of the variant being rare, it is difficult to find enough cases to support it, thus it's minor allele frequency will be very low in your sample (that's a tautology). Therefore, you will need larger effect size or larger sample sizes to detect significant association.

But consider also 3)

In my opinion, the power of GWAS studies in general is rather low, at least for a single-marker approach. From a statistical point of view a GWAS is pure madness (millions of hypotheses tested on a few hundred cases). Ask yourself this additional question: Do most GWAS studies correct their p-values for multiple testing, and if not, why not?

ADD COMMENT • link 11.9 years ago by Michael 54k

0

Entering edit mode

thanks very helpful

ADD REPLY • link 11.9 years ago by michealsmith ▴ 790

0

Entering edit mode

nice Michael. I'm too autodidact on this, although I work with biologists that are very strict on the methods' selection when coming to design an experiment, and I should say that when they choose GWAS is not because they think they're perfect, it's just because until sequencing reduced costs as it did it was the most powerful tool you could use to test the common-allele/common-disease hypothesis. they're all slowly moving to NGS, and although they're having troubles dealing with the false positives rates, the potential statistical power of the technique is just much better than GWAS.

ADD REPLY • link 11.9 years ago by Jorge Amigo 14k

0

Entering edit mode

just curious, does SNP chips only contain tags for common SNP? Or does the reference panel only contain common tag?

ADD REPLY • link 11.9 years ago by michealsmith ▴ 790

0

Entering edit mode

kind of rare but not really rare...

"The HumanOmni2.5-8 (Omni2.5) BeadChip offers the most optimal and comprehensive set of both common and rare SNP content from the 1kGP (MAF>2.5%) for diverse world populations."

http://www.illumina.com/products/humanomni25-8beadchipkits.ilmn

ADD REPLY • link 11.9 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Maybe MAF > 2.5% is the limit to what extent they believe is reliable to exclude false positives due to sequencing errors? Or it's the cutoff that sums up nicely to 4M markers. Anyways, simply increasing the number of hypotheses tested is not going to cut it. Having a reliable genotype for a sample is only the first hurdle.

ADD REPLY • link 11.9 years ago by Michael 54k

0

Entering edit mode

a little bit of this, a little bit of that, I guess. NGS false possitives have to be definitely taken into account when using them for chip design or any other kind of downstream analysis. using MAF as a validation method is going in the direction of dbSNP's "by frequency" validation, which I guess is completely logical.

ADD REPLY • link 11.9 years ago by Jorge Amigo 14k

0

Entering edit mode

SNP chips are commercial. just attending to common sense, if the commonness of certain SNPs is limited the power of the chip would be limited too, hence the scope and the potential commercial use would reduce. as Jeremy pointed out, the SNP selection on a chip is mainly based on population MAF, and depending on the density of the chip you may expect to cover lower MAFs if the number of SNPs increases. if you find a SNP chip showing off that they cover rare variation, you should realize that the rareness of those variants would also be limited.

ADD REPLY • link 11.9 years ago by Jorge Amigo 14k

score 1 · Answer 2 · 2012-06-08

1

Entering edit mode

11.9 years ago

Alex Paciorkowski 3.5k

If you are looking for associations between phenotypes and rare variants, nothing quite beats ascertaining a (large) bunch of subjects with that phenotype and directly looking at the sequence. GWAS was waaay underpowered because at the time funding for those studies was happening, the technology wasn't available to do anything other than guess at some SNPs. Now with next-gen technologies we're going to be seeing a new generation of GWAS-type studies (first "EWAS" - exome-wide...) coming down the pike. But there is a difference in saying "polymorphisms at these alleles are associated with this disease" (the GWAS hypothesis) and "if you have a mutation in this gene you get this disease...over and over again". But that's just my two cents.

ADD COMMENT • link 11.9 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

that's precisely the "common disease-common variant" (CD-CV) hypothesis that the GWAS try to find, versus the "common disease-rare variant" (CD-RV) hypothesis that we're moving into thanks to NGS and new upcoming sequencing techniques. we can't forget that with this technology change we're moving too from one hypothesis to another, trying to cover as much as possible from the same problem (which is and has always been the common disease).

ADD REPLY • link 11.9 years ago by Jorge Amigo 14k