Question

How to extract both start and end position in vcf files

0

Entering edit mode

7.9 years ago

ShirleyDai ▴ 50

Hello, I have some vcf files generated from GATK mutect2. I can use GATK VariantsToTable to extract start position of each variants. I wonder if there is an easy way to extract both start and end position in my vcf files. Thanks

next-gen vcf • 4.9k views

ADD COMMENT • link updated 7.9 years ago by MAPK ★ 2.1k • written 7.9 years ago by ShirleyDai ▴ 50

score 2 · Answer 1 · 2016-05-24

2

Entering edit mode

7.9 years ago

venu 7.1k

You mean start and end position of variants? If yes, following will work

(Updated)

vcf-annotate --fill-type Sample1.vcf | grep '^chr' | awk '{if($8 ~ /snp/)print $1"\t"$2"\t"$2"\t"$4"\t"$5; else if($8 ~ /del/)print $1"\t"$2"\t"$2"\t"$4"\t""-"; else if($8 ~ /ins/)print $1"\t"$2"\t"$2+(length($5))"\t"$4"\t"$5}' > Result.txt

ADD COMMENT • link 7.9 years ago by venu 7.1k

0

Entering edit mode

No. I need to extract SNPs and Indels (some has >10 bases) as the following format:

Single nucleotide variants

chr4 150 150 A T

Insertions

Use ‘-’ in the reference_allele field and start/end coordinates must indicate the two adjacent bases in which the insertion occurs between.

chr4 150 151 - T

Deletions

Use ‘-’ in the observed_allele field to denote deletion of the given reference allele.

chr4 150 150 A -

ADD REPLY • link 7.9 years ago by ShirleyDai ▴ 50

0

Entering edit mode

I've updated the answer.

ADD REPLY • link 7.9 years ago by venu 7.1k

0

Entering edit mode

Cool! Many Thanks.

ADD REPLY • link 7.9 years ago by ShirleyDai ▴ 50

score 0 · Answer 2 · 2016-05-24

0

Entering edit mode

7.9 years ago

MAPK ★ 2.1k

Why not use genomic ranges with custom R script?

ADD COMMENT • link 7.9 years ago by MAPK ★ 2.1k