Hi all,
I am a computer science student. I need some clarifications in the below understandings
- Two alleles represent a gene. Genes are pieces of DNA. DNA makes up chromosomes. Chromosomes are found inside the nucleus of cells. In diploid organism, the two corresponding genes in a chromosome pair (homologous chromosome) are referred as alleles. Each parent contributes one allele in each pair. So those two alleles might be identical or might have different base sequences
1) Does allele refer to the entire gene sequence or one position of the gene sequence?
2) Does the meaning of alleles in case1 and case 2 are different?
Case1: For example, the gene for height is considered. So the Tt (tall) is genotype, whereas T is one allele and t is another allele. Entire gene has two alleles.
Case2:
I have taken the screenshot from GATK VCF file. For example, at chr1 position 762,589 I have G allele in reference sequence and C allele in father and son. So the genotype for this position is G/C (heterozygous). Similarly, at other positions,
- C/G genotype, C allele in reference, G allele in father and son
- T/C genotype, T allele in reference, C allele in father and son
- T/C genotype, T allele in reference, C allele in father and son
T/A genotype, T allele in reference, A allele in father and son
Here we refer to allele as one base of a gene. There fore we have multiple alleles in single gene "GLA".
BSc Genetics, PhD Molecular Biology here. The definition of alleles being version of a gene became obsolete when the molecular basis of genetics was discovered. Alleles refer to the possible bases of a variant.
I was a little too hasty in my reply. On closer reflection, I don't actually define an allele as a whole gene. I think my definition is more like "sequence variants in a population within a given window (>1bp) of DNA sequence".
I personally think it's silly to refer to a single base pair as an allele (as in the 2nd example above).
1) Because we already have a term which is widely understood (SNP).
2) Because it fails to take into account linkage and recombination.
Two (or more) SNPs located close together will almost always be inherited together, and quite often become fixed in a population together. For this reason, I think it's a mistake to annotate individual SNPs a as completely separate "alleles". Why should each one be designated as an "allele" if some combinations are always found together?
Word definition can be tricky. Experts reading the word in context will usually understand what the writer is trying to convey but that only comes from a lot of exposure. Hence, a course in genetics.
But if experts cannot agree about definition of a technical term, it may be time to retire the term.
Firstly, the term "SNP" refers to the locus. Secondly, not all variants are SNPs, some are indels and some are structural variants.
Sets of alleles of variants in LD with each other is referred to as a haplotype.
The fact is, you can argue until you're blue in the face about how a definition should be used in a certain way, and about how all the people using it in another way are wrong, but if the majority are using it that way then you just have to go with the flow.
I don't want to argue, and I don't even necessarily disagree with you but:
"The fact is, you can argue until you're blue in the face about how a definition should be used in a certain way, and about how all the people using it in another way are wrong, but if the majority are using it that way then you just have to go with the flow."
The first google hit for "Allele" is a Nature education website where an allele is described as a "variant form of a gene."
If we work on a 'definition by majority', then an allele is a genic variant.
In any case, if we are all using the same word to mean different things, do you think there might be a problem? If a "bioinformatician" uses a word to refer to one thing, and a "biochemist" understands a different thing, I don't think anybody wins. Just like no one wins by continuing this discussion :)
I agree. Sort of.
For example, the term that I don't really think makes sense these days in animal genetics is "gene". The complex nature of our modern understanding of inheritance means that genes are originally defined don't exist. There are things we reffer to as genes today, but they aren't the same as what was originally meant by the word and what counts as a gene is pretty arbitrary. Sometimes overlapping transcripts are called genes, - but this is only exonic overlap. But which parts of genes are exons really is just a case of how hard you look. Definitions involving proteins don't work because what about lincRNAs. And why are enhancers not called genes? They are stretches of DNA that code for phenotypes and are inherited independently.
But Gene is a useful casual term, even if its hard to define in a strict sense. I think the same is probably true of the word "allele".
In Ensembl we use gene to refer to a genomic locus where transcription occurs. If the transcripts share exons, they are part of the same gene (although there are counter-examples such as readthrough transcripts where we would not define them as the same gene). This refers to transcription only, and not translation, so UTR regions and non-coding transcripts are included in this definition.
You're right, the old-school definition is a unit of inheritance. This is again obsolete (ie used by people who don't really work in the field). You also have the terrible lay-person definition whereby a gene is a thing that causes a disease (ie Daily Mail headline "Gene for obesity found", in the text we see "people with the gene are fat").
To complicate things: a SNP has two alleles. So it's not exclusively on the gene level.
I fully agree about the second part of your answer. Asking a question is no substitute for picking up a few biology and genetic books. Understanding of these principles is crucial in bioinformatics.