Haplotype annotation in VCF file of phase 3 1000 Genome project
1
0
Entering edit mode
5.3 years ago
caggtaagtat ★ 1.9k

Hi there,

I'm new to vcf file analysis and would like to download a huge database for human SNPs with information about the location, sequence variation and if it is possible to be homozygous.

So far I found this directory for files of the 1000 genome project where I think I can download the relevant data. However, I'm not sure if I look at the right columns.

The data looks like this:

22      16050654        esv3647175;esv3647176;esv3647177;esv3647178     A       <CN0>,<CN2>,<CN3>,<CN4> 100     PASS    AC=9,87,599,20;AF=0.00179712,0.0173722,0.119609,0.00399361;AN=5008;CS=DUP_gs;END=16063474;NS=2504;SVTYPE=CNV;DP=22545;EAS_AF=0.001,0.0169,0.2361,0.0099;AMR_AF=0,0.0101,0.219,0.0072;AFR_AF=0.0061,0.0363,0.0053,0;EUR_AF=0,0.007,0.0944,0.003;SAS_AF=0,0.0082,0.1094,0.002;VT=SV       GT      3|0     0|0     0|0     0|0     0|0     0|0     0|4     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0         0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     3|0     0|0     3|0     0|0     0|0     3|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|3     0|0     0|4     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|3     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|0     0|0     3|0     0|0     0|0     0|3     3|0     0|3     2|0     0|0     0|0     ...

Other Entries only show 0|0, 0|1 1|0, so I initially thought the numbers would indicate the haplotype of the SNP in different individuals. However, I don't understand the difference between 0|2 and 3|0 then.

Edit: I have to add, that there is no documentation of these columns in the vcf file header

SNP VCF Haplotype • 1.2k views
ADD COMMENT
3
Entering edit mode
5.3 years ago

Hello,

the numbers describe which REF or ALTs are present in the sample. 0 means a REF base and values greater indicates the position in the ALT column.

So a sample with a genotype 0|0 is homozygous for the reference allel. A sample with 0|2 have one reference allele and the second allele correspond to the second value in the ALT column. A sample with 0|3 have one reference allele and the second allele correspond to the 3 value in the ALT column.

The | indicates that the variants are phased. So all variants of the same chromosome assigned in front of the | are located on the same allele and those behind on the other. If phasing is unknown the delimiter would be /.

fin swimmer

ADD COMMENT
0
Entering edit mode

Thank you very much! That helps a lot. So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with n|n or n/n with n > 0 ?

ADD REPLY
1
Entering edit mode

So when I'm looking for SNPs which can occur homozygous, I would check for at least one entry with n|n or n/n with n > 0

Yes.

ADD REPLY

Login before adding your answer.

Traffic: 3106 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6