vcftools vcf-merge error
0
0
Entering edit mode
5.5 years ago
biogirl ▴ 210

Hi,

I've never had this problem before using vcftools v0.1.12 vcf-merge, but now it throws up this error:

Could not parse gtype string [fail] .. VIII:214 WGS

 at /apps/vcftools/0.1.12/lib/perl5/site_perl/Vcf.pm line 172, <__ANONIO__> line 2.
    Vcf::throw('Vcf4_2=HASH(0x1bc4b28)', 'Could not parse gtype string [fail] .. VIII:214 WGS\x{a}') called at /apps/vcftools/0.1.12/lib/perl5/site_perl/Vcf.pm line 1922
    VcfReader::parse_haplotype('Vcf4_2=HASH(0x1bc4b28)', 'HASH(0x28be3a0)', 'WGS') called at /apps/vcftools/0.1.12/bin/vcf-merge line 477
    main::merge_vcf_files('HASH(0xca8250)') called at /apps/vcftools/0.1.12/bin/vcf-merge line 12

I have logged a ticket with vcftools on sourceforge but I've never had much luck with getting a reply on there, hence the double posting on there. I've recreated my vcf files to check there was nothing wrong with them, and still get the same error. Can someone please advise what the error is referring to, how to fix, or alternatively, provide something else to merge 400+ vcfs?

Thanks

vcftools vcf-merge • 2.8k views
ADD COMMENT
0
Entering edit mode

Hey biogirl. Can you retry with BCFtools? BCFtools has long superceded VCFtools. Whilst the 'project' is still referred to the archaic name of VCFtools, the current implementation of that project is BCFtools, which is where active development is taking place.

ADD REPLY
0
Entering edit mode

Hi - I did not know that! Thanks! Ok, I've tried bcftools but still get a segmentation fault:

[E::vcf_parse_format] Invalid character 'f' in 'GT' FORMAT field

I'm assuming this isn't a vcftools/bcftools issue?

ADD REPLY
0
Entering edit mode

Yes, there is something peculiar about your VCF. Can you confirm the source of your VCF? Also, have you gzipped and tab-indexed it?

bgzip My.vcf
tabix -p vcf My.vcf.gz
ADD REPLY
0
Entering edit mode

Yes, everything has been gzipped and tab indexed. I've noticed I also get an error when using GATK CombineVariants about the VCFs. These were generated using GATK HaplotypeCaller. Never had this issue before, nothing has changed in the pipeline - the only difference is the source of the data (HiSeq X rather than HiSeq2500). Any ideas on how to get around this?

ADD REPLY
0
Entering edit mode

It's not uncommon for a GATK-produced VCF to throw an error in SAMtools or BCFtools. In this case, I would search for the offending line and take a closer look. If your VCF is not huge, then you could scan through it with bcftools view. Unfortunately, instruments (sequencers) can also produce corrupt VCFs when they don't adhere to the VCF specification.

ADD REPLY
0
Entering edit mode

It seems there are a couple of offending positions in each file (96 of them). Is there any way to force through these positions, or do I remove them?

ADD REPLY
0
Entering edit mode

Can you paste the VCF header where it has FORMAT ?

ADD REPLY
0
Entering edit mode

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

ADD REPLY
0
Entering edit mode

I hate to drag this on, but it's important to diagnose what's going on. If you look at the problematic positions, how does the full VCF record actually appear?

ADD REPLY
1
Entering edit mode

I think I've sorted it - I've just grepped out any of the offending characters in the GT field.

ADD REPLY
0
Entering edit mode

what is the output of

gunzip -c your.vcf.gz  | grep -v "#" | cut -f 10- | tr "\t" "\n" | cut -d ':' -f 1 | sort | uniq

?

ADD REPLY

Login before adding your answer.

Traffic: 2669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6