Entering edit mode
5.5 years ago
biogirl
▴
210
Hi,
I've never had this problem before using vcftools v0.1.12 vcf-merge, but now it throws up this error:
Could not parse gtype string [fail] .. VIII:214 WGS
at /apps/vcftools/0.1.12/lib/perl5/site_perl/Vcf.pm line 172, <__ANONIO__> line 2.
Vcf::throw('Vcf4_2=HASH(0x1bc4b28)', 'Could not parse gtype string [fail] .. VIII:214 WGS\x{a}') called at /apps/vcftools/0.1.12/lib/perl5/site_perl/Vcf.pm line 1922
VcfReader::parse_haplotype('Vcf4_2=HASH(0x1bc4b28)', 'HASH(0x28be3a0)', 'WGS') called at /apps/vcftools/0.1.12/bin/vcf-merge line 477
main::merge_vcf_files('HASH(0xca8250)') called at /apps/vcftools/0.1.12/bin/vcf-merge line 12
I have logged a ticket with vcftools on sourceforge but I've never had much luck with getting a reply on there, hence the double posting on there. I've recreated my vcf files to check there was nothing wrong with them, and still get the same error. Can someone please advise what the error is referring to, how to fix, or alternatively, provide something else to merge 400+ vcfs?
Thanks
Hey biogirl. Can you retry with BCFtools? BCFtools has long superceded VCFtools. Whilst the 'project' is still referred to the archaic name of VCFtools, the current implementation of that project is BCFtools, which is where active development is taking place.
Hi - I did not know that! Thanks! Ok, I've tried bcftools but still get a segmentation fault:
I'm assuming this isn't a vcftools/bcftools issue?
Yes, there is something peculiar about your VCF. Can you confirm the source of your VCF? Also, have you gzipped and tab-indexed it?
Yes, everything has been gzipped and tab indexed. I've noticed I also get an error when using GATK CombineVariants about the VCFs. These were generated using GATK HaplotypeCaller. Never had this issue before, nothing has changed in the pipeline - the only difference is the source of the data (HiSeq X rather than HiSeq2500). Any ideas on how to get around this?
It's not uncommon for a GATK-produced VCF to throw an error in SAMtools or BCFtools. In this case, I would search for the offending line and take a closer look. If your VCF is not huge, then you could scan through it with
bcftools view
. Unfortunately, instruments (sequencers) can also produce corrupt VCFs when they don't adhere to the VCF specification.It seems there are a couple of offending positions in each file (96 of them). Is there any way to force through these positions, or do I remove them?
Can you paste the VCF header where it has FORMAT ?
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
I hate to drag this on, but it's important to diagnose what's going on. If you look at the problematic positions, how does the full VCF record actually appear?
I think I've sorted it - I've just grepped out any of the offending characters in the GT field.
what is the output of
?