GT and GL fields in VCF file
2
1
Entering edit mode
9.4 years ago
ChIP ▴ 600

Hi!

This might sound completely stupid, lazy and silliest question. But please help in understanding these acronyms of VCF file. I did had a look at the format pdf of VCF file and I got further confused. So, I have my data in a VCF file version 4.2:

chr1 10043 . T C 21 PASS 3 GT:DP:FT:GQ:GL 1/0:8:PASS:20:-2.0739,-0.00379451,-27.336
chr1    10055    .    T    G    4    PASS    3    GT:DP:FT:GQ:GL    0/0:13:PASS:31:-0.00031839,-3.6121,-54.5012
chr1    10105    .    A    C    7    PASS    3    GT:DP:FT:GQ:GL    1/0:45:PASS:13:-1.58548,-0.0193946,-153.729

Now, what does GT here represent for, it stand for Genotype and the different values that I have for GT, what does that represents and consequence can they have on a SNP call? Can I also determine if the mutation is one copy or in both copies of gene ? If yes how?

For GQ, I guess it is a phred score with -10log10p and higher the better?

The genome used is hg19.

Please help.

Thank you

SNP VCF • 31k views
ADD COMMENT
10
Entering edit mode
9.4 years ago
iraun 6.2k

If you read the VCF format specification pdf (http://samtools.github.io/hts-specs/VCFv4.1.pdf) you'll have the answer. Summarizing, GT represent the genotype, encoded as allele values separated by / or |. If the allele value is 0, means that it is equal to the reference allele (what is in REF field), if 1 mean that is equal to alternative (first allele listed in ALT), and if 2 it is equal to the second allele listed in ALT (if it exists).

So a SNP tagged with GT = 1/1 represent a SNP homozygous for the ALT allele (1/0 heterozygous, 0/0 homozygous for the reference).

For determining if the call is homozygous or heterozygous I suggest you to read this thread: How To Distinguish Heterozygotes And Homozygotes From Variants In Vcf Format?. It can be done according to different criteria.

ADD COMMENT
0
Entering edit mode

Hi,

Thank you for the answer, since I am making rules to identify homozygous and heterozygous calls in my data. I have following combinations in my GT field.

0/0
1/0
1/1
2/0
2/1
2/2
3/0
3/1
3/2
3/3

I know first is homozygous call and second is heterozygous call, but what about the others? Could you please help.

ADD REPLY
1
Entering edit mode

Hi,

I have some GT that is ./. Do you know what this means? I am assuming no reads align to this position and we do not know what the genotype is there? Best, Thanks. C.

ADD REPLY
1
Entering edit mode

./. means that there is not enough information. It depends on the thresholds you set when you call variants: say that you want at least coverage 10 to call a variant and you have only 8 reads all on one strand, this is not easy to judge by the algorithm so places there a ./. (or at least this is my understanding of this issue).

I remember it's written somewhere here: https://samtools.github.io/hts-specs/VCFv4.2.pdf

ADD REPLY
0
Entering edit mode

Hi, sometimes i see 1/. or ./1 or 0/. or ./0 What does it mean?

ADD REPLY
0
Entering edit mode
  • 0 = reference sequence allele
  • 1 = first variant allele
  • 2 = second variant allele
  • ...
  • n = n variant allele

In your case you called variants for 3 different lines on 1 reference, thus you have numbers ranging from 0 to 3. Whenever the numbers are equal you have a homozygous call, whenever they're not you have a heterozygous call.

ADD REPLY
0
Entering edit mode

If the type is GT does it means it's only SNPs? How can I know that?

ADD REPLY
0
Entering edit mode

If you called the variants, you will see which kind of variant that is by looking in the field number 8 of your VCF file (INFO field).

ADD REPLY
2
Entering edit mode
9.4 years ago
Cytosine ▴ 460

AFAIK the GT field tells you if a call for a given variant is a:

  • 0 - reference call
  • 1 - alternative call 1
  • 2 - alternative call 2
  • ...

In your case the first and the third variants are considered heterozygous, as both the reference allele and an alternative allele appear in their pileups. The second variant is considered a homozygous call, with only the reference allele present.

Yes, the GQ field is a phred score with higher values being better.

ADD COMMENT
0
Entering edit mode

Hi,

Thanks for explaining. one more thing, if GT is 2/0, then also the call is heterozygous? Since, I am making a small awk one liner to filter these homo or heterozygous calls can yuou please confirm these:

0/1, 1/1, 1/1, 1/0 and 0/0 -- hetero, homo,homo, hetero calls or can you suggest any other alternative rule to recognise the possible homo and hetero calls?

ADD REPLY
2
Entering edit mode

Sorry for the late reply.

Just about any combination of differing alleles means heterozigosity (0/1, 1/0, 1/2, 2/3, 3/1, 2/0,...)

I use this oneliner to separate homo- and hetero- zygous calls. Just switch the '==' for '!=' respectively.

grep -v "#" my-favorite-snps.vcf | awk '{print $NF}' | awk -F ":" '{print $1}' | awk -F "/" '$1==$2 {print}'
ADD REPLY
0
Entering edit mode

How can I add missing fields? My fields are: I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;MQ0F=0;MQSB=1;SGB=-0.379885;RPB=1;MQB=1;BQB=1;DP=25;QS=1,0,0.5

ADD REPLY

Login before adding your answer.

Traffic: 2122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6