Biostar Beta. Not for public use.
Question: VCF format for phased data
2
Entering edit mode

Despite the detailed explanation of VCF format on the 1000Genomes site, it is still not clear to me how the data should be interpreted with respect to sample results.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00002

20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 1|0:48:8:51,51

20 1230237 . T C 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|1:3:5:65,3

For individual NA00002 the vertical upright bar in the second position indicates that the data is phased. But is there any significance as to which side of the bar the digits occur??

Eg for position 14370 does the first digit "1" in "1|0" (>A) relate to a particular parent---mother or father? And the second digit on the right of the bar "0" (>G) indicate the base from the other parent. Similarly at position 1230237 first digit "O" (>T) and second digit to the right of the bar "1" (>C) .

If so then the left chromosome will read AT and the right chromosome GC. Correct? or is it impossible to tell from the order of the alleles with respect to the vertical bar?

thank you in advance

ADD COMMENTlink 5.5 years ago nschaefer • 20 • updated 2.8 years ago Biostar 20
2
Entering edit mode

The "|" just indicates that the genotype call is part of a block. In general, it does not mean it is from the mother or father (though it's possible to know that for trio's). It just means that the relative origin of the variants in the same block can be inferred.

So, in VCF, a block starts with "/" and continues as long as the following lines are "|" so:

REF ALT GT1 GT2

A T 0/1 0/1

C G 1|0 0|1

G A 1|0 1|0

G T 1|0 0|1

Would be a haplotype of AGAT/TCGG for sample 1 and ACAG/TGGT for sample 2. But, we don't know which parent those haplotypes came from.

ADD COMMENTlink 5.5 years ago brentp 23k
Entering edit mode
0

From what you're saying the position of the value wrt | is significant. So in your example the last three positions on the left of | implies they are ALT and on the same chromosome whilst to the right of | the values are REF and on the other chromosome. However as the first position is not phased is it possible to associate either allele with those below ie doesn't the block (chromosome segment) actually start with line 2?

From 100Genomes: "The meanings of the separators are as follows (see the PS field below for more details on incorporating phasing information into the genotypes):

  • / : genotype unphased
  • | : genotype phased"

I did try to sort this out myself looking at the gene NPC1 on chr18 but in all cases of supposed family trios the child had been redacted, so not possible to check phased formatting.

Thanks brentp.

ADD REPLYlink 5.5 years ago
nschaefer
• 20
Entering edit mode
0

"/" indicates that it is not phased with anything before it.

"|" indicates that it is phased with (at least) the line before it.

So a block starts with "/' and ends 1 line before the next "/".

So if all you have are unphased genotypes "/" each line is the start and end of its own block.

So, to answer your first question, Yes, you can tell that all 4 variants, even the first are phased together.

ADD REPLYlink 5.5 years ago
brentp
23k
Entering edit mode
0

OK, so the phasing is with the line(s) before rather than after the |. I wish that had been made explicit in the 1000Genomes page.

Thanks again

ADD REPLYlink 5.5 years ago
nschaefer
• 20
0
Entering edit mode

This is not really my field of expertise, but you might want to read this: http://www.nature.com/nrg/journal/v12/n10/full/nrg3054.html

ADD COMMENTlink 5.5 years ago Chris Evelo 10.0k
Entering edit mode
0

Thanks Chris for the link. Whilst it may not have answered the specific question, it was very interesting for my broader goals viz; imputing/interpolating values for all alleles in my 23&Me phased results (1M positions) using larger databases such as 1000Genomes and beyond.

ADD REPLYlink 5.5 years ago
nschaefer
• 20

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0