Phased And Unphased Genotypes In Vcf Files: Does The Order Of Alleles Matter?
2
27
Entering edit mode
13.2 years ago
Chronos ▴ 610

As this page explains, phased genotypes are alleles-order-sensitive.

I assume that the order of alleles in VCF phased genotypes (like 0|1 and 1|0) is important as well, but I failed to find any confirmation of that in the format description.

Or is the order-sensitive alleles listing such a common thing that it doesn't need explicit description? (I'm new to the field.)

vcf genotyping • 30k views
ADD COMMENT
14
Entering edit mode
13.0 years ago

the phase status of an allele takes into account in which chromosome pair has been found. as far as I know, the main reason to use allele phasing information is to increase the correctness of the haplotypes and haplotype blocks inferred from them. it makes sense to name all allele pairs sorted in the same way once you know which allele pair is on which chromosome pair, because if you have all this information sorted you'll be able to easily build haplotypes by dealing sequentialy first with first allele bases only and then with second allele bases only.

trying to be a little more visual (and simplistic too, so please all basic geneticists accept my apologizes in advance), take the table from the webpage you've mentioned:

IND, id1, id2, id3, id4, id5
rs1, AT, TT, ??, AT, AA
rs2, GC, CC, GG, CC, CG
rs3, CC, ??, ??, CG, GG
rs4, AC, CC, AA, AC, AA

if you look to individual 1 (id1) you will have 2 different haplotypes: AGCA (from first chromosome pair) and TCCC (from second chromosome pair). this information wouldn't be known if genotypes were unphased, in which case other haplotyping algorithm should be applied.

ADD COMMENT
3
Entering edit mode

I know this post is "old" but it was helpful for me as a springboard to go into more finding on the subject. If it was helpful to me now, it definitely will be helpful to others "tomorrow". Below is an excerpt (copy-paste) from The Variant Call Format and VCFtools - Danecek et al (2011) :

GT, genotype, encodes alleles as numbers: 0 for the reference allele, 1 for the first allele listed in ALT column, 2 for the second allele listed in ALT and so on. The number of alleles suggests ploidy of the sample and the separator indicates whether the alleles are phased (”|”) or unphased (”/”) with respect to other data lines (Figure 1).

ADD REPLY
1
Entering edit mode

This was useful but still left the meaning of the order of the alleles ambiguous for me - i.e. which alleles are in the same chromosome/phase. A look at Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original paper you referenced confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD REPLY
0
Entering edit mode

So when VCF has 0|1 or 1|0, then it is safe to assume that first column (before |) always represents one haplotype, and second column (after |) always represents another haplotype?

ADD REPLY
0
Entering edit mode

more or less. you will be able to build a haplotype with the alleles on the first column, and another one with the alleles on the second column.

ADD REPLY
0
Entering edit mode

Thanks, this is what I wanted to be sure of.

ADD REPLY
7
Entering edit mode
11.0 years ago
sliders ▴ 80

Fig 1 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137218/figure/F1/) in the original VCF/VCFtools paper referenced above confirms that for phased data alleles from different variants that are in the same position in the GT field are on the same chromosome (provided they are in the same phase set which is implied if no PS field is present).

ADD COMMENT

Login before adding your answer.

Traffic: 2778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6