Interpreting Gaps at Pos 0 in Terms of VCF
1
0
Entering edit mode
9.8 years ago
pld 5.1k

I'm writing a python script to convert clustal formatted alignments into VCF files. I'm lost on one thing, how to interpret a gap at the start of an alignment:

ENG1-REF-K      ATTTAAGTGAATAGCTTGGCTATCTCACTTCCCCTCGTTCTCTTGCAGAACTTTGATTTT
MERS_EMC_V      ---------------------------------------------CAGAACTTTGATTTT
                                                             ***************

Based on the VCF format, it seems to assume that there is a base upstream of the deletion. E.g. if I have ACGT and A-GT, the VCF file should be REF: AC, ALT: A. The position of the deletion is 2, but the position of the ALT is 1 according to VCF.

http://samtools.github.io/hts-specs/VCFv4.2.pdf

How are terminal deletions considered in VCF?

msa alignment clustal vcf • 2.5k views
ADD COMMENT
4
Entering edit mode
9.8 years ago
Zhaorong ★ 1.4k

From the VCF (Variant Call Format) version 4.1 specification (and also the 4.2):

the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event.

So in your example:

POS = 1
REF = ATTTAAGTGAATAGCTTGGCTATCTCACTTCCCCTCGTTCTCTTGC
ALT = C
ADD COMMENT

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6