Biostar Beta. Not for public use.
about merging VCF files
0
Entering edit mode
15 months ago
Bogdan • 780
Palo Alto, CA, USA

Dear all,

we do have a large number of VCF files , I am attempting to merge all of them by using VCF tools, in the following way :

for f in *.vcf do bgzip -c "$f" > "$f.gz" tabix -p vcf "$f.gz" done

and :

vcf-merge *vcf.gz.

However, during the vcf-merge step, I am getting an error :

Use of uninitialized value in hash element at /usr/local/share/perl/5.18.2/Vcf.pm line 1720, <__ANONIO__> line 1158. Use of uninitialized value in hash element at /usr/local/share/perl/5.18.2/Vcf.pm line 1720, <__ANONIO__> line 1158.

any advice on why we do get this error ? many thanks !

-- bogdan

vcf SNP • 2.2k views
ADD COMMENTlink
1
Entering edit mode

you could try vcflib's merger.

ADD REPLYlink
0
Entering edit mode

Thank you Ram for your suggestion, a simple question though : as echo prints the command; how do I execute the java command that echo will print :

"echo -n "java -jar GenomeAnalysisTK.jar -T CombineVariants -R reference.fasta -o combined_output.vcf -genotypemergeOptions UNIQUIFY " && for vcf_file in $(ls folderWith100VcfFiles/*.vcf); do echo -n "--variant ${vcf_file}"; done"

ADD REPLYlink
0
Entering edit mode

Please do not add your comment as an answer. Move this to a reply to this comment: C: about merging VCF files

To do that, copy the contents of your comment above, click on "Add Reply" on my comment and paste what you copied. Then, hit Add comment. Once you do that, I will answer your question.

ADD REPLYlink
2
Entering edit mode
15 months ago
United States

GATK also has one that i've used recently for thousands of vcfs.

ADD COMMENTlink
0
Entering edit mode

Thanks Zev. A question though, about merging with GATK tools : is there any way to specify tens of vcf files without having to input them one by one with the "-L" option ? thanks ;)

ADD REPLYlink
0
Entering edit mode

CombineVariants does not need the -L option for that, you just need a --variant before each VCF file name. The question is - does CombineVariants work with .vcf.gz? IIRC, it should.

ADD REPLYlink
0
Entering edit mode

Thanks Ram. I am looking for a way to combine > 100 VCF files (from a folder) in a more automatic way, from a script, without having to write the name of each file after --variant. Is there any way to do that ? many thanks ;) !

ADD REPLYlink
2
Entering edit mode

use a list:

find path -type f -name "*.vcf" > input.list

and then use GATK with --variant input.list

ADD REPLYlink
0
Entering edit mode

There will be someone that recommends make here, but I'm not good at that, so I'd say generate the GATK command with a bunch of echos. Something like:

 # Newlines added for readability - remove newlines before you run the command
$echo -n "java -jar GenomeAnalysisTK.jar -T CombineVariants -R reference.fasta
-o combined_output.vcf -genotypemergeOptions UNIQUIFY "
&& for vcf_file in $(ls folderWith100VcfFiles/*.vcf); do echo -n "
--variant ${vcf_file}"; done;

This will echo the static part first and echo, for each vcf file in the directory folderWith100VcfFiles, a properly formatted --variant argument.

ADD REPLYlink
0
Entering edit mode

that looks great ! thanks a lot , will use it as soon as I arrive in the lab ! BTW, could you recommend any good book on shell scripting/programming ? thanks ;) !

ADD REPLYlink
0
Entering edit mode

Sorry, I learnt shell programming through trial and error (and a LOT of Google) - I'm not aware of any book, although I bet there are a whole lot of useful books.

Tackle man pages one at a time and you will get there :)

ADD REPLYlink
0
Entering edit mode

thanks a lot Ram ! BTW, thought that I shall ask: are you guys doing a lot of somatic mutation variant calling ? if you do, what algorithms/software do you use ?

ADD REPLYlink
0
Entering edit mode

Please open a new question.

ADD REPLYlink
0
Entering edit mode

Thanks Ram for keeping the conversation organized. To reiterate my previous question about how do I execute the java command that echo will only print :)

echo -n "java -jar GenomeAnalysisTK.jar -T CombineVariants -R reference.fasta -o combined_output.vcf -genotypemergeOptions UNIQUIFY " && for vcf_file in $(ls folderWith100VcfFiles/*.vcf); do echo -n "--variant ${vcf_file}"; done"

ADD REPLYlink
1
Entering edit mode

You can either copy-paste the echo'd command to the command prompt and then hit return to run it, or redirect the output of the echos to a shell script file, chmod it and run the file. Better, use Pierre's solution - a list file is so much easier to handle.

ADD REPLYlink
0
Entering edit mode

thank you Ram and Pierre !

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1