major allele

Question

Definition of minor and major allele and connection with risk, effect, wildtype and reference allele

17

Entering edit mode

6.0 years ago

m98 ▴ 420

In the context of genotype data and/or NGS datasets, could someone provide a clear definition and differences between minor, major, risk, reference, wildtype and effect alleles? I find terms are very often interchanged without a clear definition in various software.

My current understand is as follows:

major allele: the most common allele for a given SNP
minor allele: the less common allele for a SNP. The MAF is therefore the minor allele frequence. This measure can be used to get a rough idea of the variation of genotypes for a given SNP in a given population, in other words it tells you how common this SNP is.
risk allele: in the context of a disease, this is the allele that confers a risk of developing the disease. Most of the time, risk allele = minor allele, as most people will not carry the risk allele. However, in some case, the risk allele can in fact be the major allele.
effect allele: ??
reference allele: ?? Is this the major allele, i.e the most common allele?
wildtype allele: ?? Is this the same as the reference allele?

Apologies if this is a really basic question, but I feel that after encountering all the various terms in different places, I am quite confused and in need a precise definitions. Many thanks.

risk minor major reference alleles • 49k views

ADD COMMENT • link written 6.0 years ago by m98 ▴ 420

score 45 · Accepted Answer · 2018-04-23

45

Entering edit mode

6.0 years ago

Kevin Blighe 87k

major allele

"the most common allele for a given SNP"... in the cohort in question. The cohort may be just 10 people, though, or it could be 2,504 like in 1000 Genomes Phase III. In addition, the major allele, by definition, could have a frequency of 50.5%, in which case, although it is more frequent, it is only more frequent by 0.5%. The point that I want to make is that the major allele only makes sense when you understand the cohort in which it is the major allele, and also the size of that cohort.

minor allele

As above but, yes, the reverse, in that it is the less frequent allele. Also, yes, the MAF is the frequency of the minor allele and, from the MAF, one can infer the frequency of the major allele if it is a bi-allelic site (some sites understandably are tri- or quad-allelic).

On what you said about the "variation of genotypes", if a site has a very low MAF in a global cohort (i,.e. samples from various parts of the World), it may imply that the major allele is conserved and is 'fixed' in the human genome, but not necessarily. A very rare allele at such a site may, thus, be under selective pressure if it reflects positive gain of function, or it could be deleterious and more likely to be eliminated from the human lineage.

risk allele

What you said is correct. The risk allele is statistically significantly associated with risk of having a disease under study. Such an allele should have genome-wide significance and have an odds ratio > 1.0. A situation in which a major allele may be seen as the 'risk allele' is where the minor allele is found to be protective against disease by having an odds ratio < 1.0, coupled with a statistically significant p-value. However, such a situation is not usually interpreted from the context of the major allele being the risk allele.

You may have been thinking about rare and common (MAF>5%) variants. For example, it is accepted (by those who actually think) that common alleles have roles in disease. An example are the variants in the CCND1 locus, which have MAFs of ~15% in Caucasians but which confer increased risk of ER+ breast cancer. Look at Rare and common variants: twenty arguments. for further reading.

I should add that many rare variants may be functionless, but that they can still accumulate in the human genome and eventually become functional if combined with other nearby variants. For example, variants accumulated over time eventually form novel TSS sites, TF binding sites, histone binding sites, protein binding sites, etc.

------------------------------------

In relation to the above 3, you may enjoy reading a recent answer that I gave: A: SNP dataset and Z Score

effect allele

This isn't used that much. It is essentially the allele whose effects in relation to disease are being studied. The effect allele is therefore, invariably, the minor allele.

reference allele

If you hear this term, exercise caution. The best way to view it is as the allele that is in a particular reference build, e.g., GRCh37 / hg19, GRCh38 / hg38, etc. In some cases, however, the reference allele can be a risk allele. Read here for further information: A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy.

wildtype allele

Not the same as the reference allele. A wildtype allele is specific to your case-control study and is merely the allele that is present in your wild-type samples. This could feasibly be a minor allele, or anything else - it's specific to your study and what you view as the wild-type condition.

Thank you.

Kevin

ADD COMMENT • link 5.1 years ago by Kevin Blighe 87k

1

Entering edit mode

This saved me alot of time! Well explained. Thanks alot

ADD REPLY • link 4.7 years ago by shreyajha ▴ 30

1

Entering edit mode

well explained and comprehensive

ADD REPLY • link 3.6 years ago by wintermelontea ▴ 10

0

Entering edit mode

Thank you so much! I should have added "ancestral" allele in my question. Am I correct in saying the ancestral allele is the major allele, given a large reference population?

ADD REPLY • link 6.0 years ago by m98 ▴ 420

5

Entering edit mode

Yes, the ancestral allele would be the major allele. Again, however, due to the 'quirks' of the reference genome builds, the ancestral allele is not always the allele that appears in the reference genome. The reference genome has many thousands of rare alleles, at least in the case of hg19.

Edit: note that 'ancestral' will be interpreted differently depending on who you talk to. Here is another definition that more or less refers to the major allele: https://biology.stackexchange.com/questions/19159/ancestral-allele-explanation

A situation could arise, though, where a rare allele could confer gain of function and, therefore, it would eventually become more frequent than the ancestral allele. This is obviously over many many generations, though, and is more in the realm of evolution.

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you so much for clarifying all this for me!

ADD REPLY • link 6.0 years ago by m98 ▴ 420

0

Entering edit mode

I still feel a bit of confused. what's the difference between alternative allele and effective allele? effective allele is risk allele?

ADD REPLY • link 4.3 years ago by dandanli0365 • 0

0

Entering edit mode

There may be no difference. It is study-dependent.

ADD REPLY • link 4.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for the detailed explanation.

ADD REPLY • link 3.2 years ago by solomoncharles77 ▴ 90

0

Entering edit mode

You are very welcome.

ADD REPLY • link 3.2 years ago by Kevin Blighe 87k

0

Entering edit mode

I have some points on the answer that I am not sure about: First: The effect allele is used every where in GWAS studies Second: Why the fact that the minor allele is the effect allele? Can't the effect allele be just the major allele and the specific study studies the relation between it and the trait? Even as a protective effect which I think is found a lot in summary statistic files.

Thanks a lot

ADD REPLY • link 2.9 years ago by tito • 0

1

Entering edit mode

Hi, yes, the effect allele can be the minor or major allele

ADD REPLY • link 2.9 years ago by Kevin Blighe 87k