Merge multiple VCF files (same variants, same sample) into one VCF file
3
1
Entering edit mode
6.7 years ago
Eleanore ▴ 10

Dear all,

I have a problem at hand regarding the manipulation of multiple VCF files (containing the same variants and referred to the same sample) so as to merge their INFO fields..

The context.

Say I have the following VCF file (headers not included):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    .   GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

Now, I create two copies of the same VCF file, and annotate each one of them with two annotation sources. So, the first one becomes:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    CustomOne=1 GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

while the second one becomes:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    CustomTwo=2 GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

I would like now to merge the aforementioned copies, so as to obtain:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    CustomOne=1;CustomTwo=2 GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

Basically, the result I would like to achieve maintains the same #CHROM, POS, REF, ALT, QUAL, FILTER, FORMAT and sample columns, and merges the contents of the INFO column found in each copy.

The solution I tried.

I tried (unsuccessfully) with several options:

  • bcftool merge, but this supposes to merge different samples, while I am working with the same sample
  • bcftool concat, but this concats two VCF files
  • SnpSift annotate, but this does not accept a list of files which is greater than two, meaning that I cannot use this command if the number of copies to be merged is greater than two

My question!

Can you suggest me how to proceed?

Thank you for your help.

annotation vcf • 6.0k views
ADD COMMENT
0
Entering edit mode
6.7 years ago
trausch ★ 1.9k

Two INFO fields with the same name "Custom" are not allowed but I think, the recent bcftools versions can relabel INFO fields:

bcftools annotate -a custom1.vcf.gz -c INFO/CustomImported:=INFO/Custom custom2.vcf.gz

ADD COMMENT
0
Entering edit mode

Yeah, sorry, I got a wrong example. I am to re-edit the question putting two different INFO fields... So, does this command allow multiple files too?

ADD REPLY
0
Entering edit mode

Maybe there is a more elegant solution but pipes should work:

zcat custom1.vcf.gz | bcftools annotate -a custom2.vcf.gz -c INFO/CustomTwo - | bcftools annotate -a custom3.vcf.gz -c INFO/CustomThree -

ADD REPLY
0
Entering edit mode

This is a solution that I applied at first, but it does not scale since it continuously opens new annotation processes (N-1 if the copies are N), which does not scale. Isn't there a tool that does this operation for me, without launching several annotation processes?

ADD REPLY
0
Entering edit mode
6.5 years ago
thondeboer ▴ 40

I think GATK's CombineVariants can do this...I have the same issue but have not confirmed this yet. https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

And now GATK is going to be Open Source, you should be able to use it freely. It's in Beta5 now, and won't be officially released until Jan 9th 2018, but BETA should work...I always loved the VCF manipulation that you could do with GATK and that was in the 2.4 days, so can only imagine that it got better

ADD COMMENT
0
Entering edit mode

Seems that CombineVariants is no longer part of GATK4 and the closest tool in GATK 4 is MergeVcfs, but that is not smart and simply creates duplicate lines and does not merge the annotations...Sorry...

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6