Biostar Beta. Not for public use.
vcftools vcf-merge error
0
Entering edit mode
23 months ago
biogirl • 170
European Union

Hi,

I've never had this problem before using vcftools v0.1.12 vcf-merge, but now it throws up this error:

Could not parse gtype string [fail] .. VIII:214 WGS

at /apps/vcftools/0.1.12/lib/perl5/site_perl/Vcf.pm line 172, <__ANONIO__> line 2.
Vcf::throw('Vcf4_2=HASH(0x1bc4b28)', 'Could not parse gtype string [fail] .. VIII:214 WGS\x{a}') called at /apps/vcftools/0.1.12/lib/perl5/site_perl/Vcf.pm line 1922
VcfReader::parse_haplotype('Vcf4_2=HASH(0x1bc4b28)', 'HASH(0x28be3a0)', 'WGS') called at /apps/vcftools/0.1.12/bin/vcf-merge line 477
main::merge_vcf_files('HASH(0xca8250)') called at /apps/vcftools/0.1.12/bin/vcf-merge line 12


I have logged a ticket with vcftools on sourceforge but I've never had much luck with getting a reply on there, hence the double posting on there. I've recreated my vcf files to check there was nothing wrong with them, and still get the same error. Can someone please advise what the error is referring to, how to fix, or alternatively, provide something else to merge 400+ vcfs?

Thanks

0
Entering edit mode

Hey biogirl. Can you retry with BCFtools? BCFtools has long superceded VCFtools. Whilst the 'project' is still referred to the archaic name of VCFtools, the current implementation of that project is BCFtools, which is where active development is taking place.

0
Entering edit mode

Hi - I did not know that! Thanks! Ok, I've tried bcftools but still get a segmentation fault:

[E::vcf_parse_format] Invalid character 'f' in 'GT' FORMAT field


I'm assuming this isn't a vcftools/bcftools issue?

0
Entering edit mode

Yes, there is something peculiar about your VCF. Can you confirm the source of your VCF? Also, have you gzipped and tab-indexed it?

bgzip My.vcf
tabix -p vcf My.vcf.gz

0
Entering edit mode

Yes, everything has been gzipped and tab indexed. I've noticed I also get an error when using GATK CombineVariants about the VCFs. These were generated using GATK HaplotypeCaller. Never had this issue before, nothing has changed in the pipeline - the only difference is the source of the data (HiSeq X rather than HiSeq2500). Any ideas on how to get around this?

0
Entering edit mode

It's not uncommon for a GATK-produced VCF to throw an error in SAMtools or BCFtools. In this case, I would search for the offending line and take a closer look. If your VCF is not huge, then you could scan through it with bcftools view. Unfortunately, instruments (sequencers) can also produce corrupt VCFs when they don't adhere to the VCF specification.

0
Entering edit mode

It seems there are a couple of offending positions in each file (96 of them). Is there any way to force through these positions, or do I remove them?

0
Entering edit mode

Can you paste the VCF header where it has FORMAT ?

0
Entering edit mode

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

0
Entering edit mode

I hate to drag this on, but it's important to diagnose what's going on. If you look at the problematic positions, how does the full VCF record actually appear?

1
Entering edit mode

I think I've sorted it - I've just grepped out any of the offending characters in the GT field.

0
Entering edit mode

what is the output of

gunzip -c your.vcf.gz  | grep -v "#" | cut -f 10- | tr "\t" "\n" | cut -d ':' -f 1 | sort | uniq


?