A tool to annotate one VCF file with INFO records of another VCF taking SNP into account?
1
2
Entering edit mode
9.0 years ago

I would like to annotate records in one VCF file (input.vcf) with some of the INFO fields of the corresponding records from the database (db.vcf), but only if the recorded mutation matches exactly in input and in the database. E. g. let's say I have three very simple VCF files:

input.vcf

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    A=3.0

db1.vcf

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    B=4.0

db2.vcf

##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       T       100     PASS    B=4.0

Note that db1 and db2 describe different SNPs at the same locus; SNP in db1.vcf matches with the one in input.vcf, but SNP in db2.vcf does not. I need a tool that can discern such cases and annotate the input file record with information from database only if the mutations match. Is there a tool to accomplish what I want?

I tried using GATK's VariantAnnotator and vcflib's vcfaddinfo; they unfortunately both ignore information about the mutation and add B=4.0 annotation in both cases.

Just to clarify, this is what I want in the case described:

$ some_tool input.vcf db1.vcf # SNP in input and database matches
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    A=3.0;B=4.0
$ some_tool input.vcf db2.vcf # SNP in input and database do not match
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    A=3.0
VCF SNP variant-annotation • 4.1k views
ADD COMMENT
10
Entering edit mode
9.0 years ago

Try bcftools annotate

bgzip -c input.vcf > input.vcf.gz; tabix input.vcf.gz
bgzip -c db.vcf > db.vcf.gz; tabix db.vcf.gz

bcftools annotate -a db.vcf.gz -c CHROM,POS,REF,ALT,INFO/B input.vcf.gz > output.vcf

this will fill in INFO/B from db.vcf.gz when all of CHROM,POS,REF and ALT match.

ADD COMMENT
0
Entering edit mode

Just tested, and it does precisely what I want. Thank you so much!

Also I see that you are a maintainer and developer on bcftools, so double thank you for both the tool and your answer :-)

ADD REPLY
0
Entering edit mode

Thank you so much Shane McCarthy .. I was stuck with this for a while and this does what I was exactly looking for!!

ADD REPLY

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6