Ensembl variant effect predictor fails if REF or ALT allele has a length around 0.5 million
1
1
Entering edit mode
5.0 years ago
mo.imranshah ▴ 10

I have been using VEP (variant effect predictor) from Ensembl for annotating VCFs produced by GATK's haplotype caller and PINDEL. The VEP is failing for some of the VCFs with the following error:

> -------------------- EXCEPTION --------------------
MSG: 
ERROR: Forked process(es) died: read-through of cross-process communication detected

>STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:554
STACK Bio::EnsEMBL::VEP::Runner::next_output_line vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:360
STACK Bio::EnsEMBL::VEP::Runner::run vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:202
STACK toplevel vep/version95/vep:225
Date (localtime)    = Thu May  9 13:25:54 2019
Ensembl API version = 95 
---------------------------------------------------

It took me weeks to rectify the actual cause of this error as I was not able to find the solution on forums. I have tried adjusting the --buffer and --forks parameters as suggested on several forums but no success. It turns out to be an issue of REF and ALT alleles size for some variant. When I excluded the records with ALT/REF alleles' length more than 1000, I have got the results without any error.

VEP offline command used is:

vep --buffer_size 1000 --offline -i dataset_22336.dat -o dataset_22337.dat --cache --dir vep/database/ --force_overwrite --merged --cache_version 95 --assembly GRCh38 --fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa --fork 32 --everything --vcf

What could be a possible solution to run VEP on the records with ALT/REF alleles' length in 0.5 to 2 million? Any help would be much appreciated.

Thanks in advance. Tagging @ Emily_Ensembl

vcf vep vcf annotation ensembl • 2.0k views
ADD COMMENT
1
Entering edit mode

do you really want to annotate a variant with this length ?

ADD REPLY
0
Entering edit mode

Pierre Lindenbaum, Could you please suggest what would be the optimal length to go with and exclude insignificant variants.

Thanks.

ADD REPLY
1
Entering edit mode
5.0 years ago
Ben_Ensembl ★ 2.4k

Hi mo.imranshah,

There are difficulties in handling long allele strings (>1000bp) for variants in VEP when fetching everything that overlaps the allele string and probably this was what lead to the fork failing.

We plan to look more into it to figure out exactly what would be an 'upper limit' and how to handle these cases better.

However, it may be more efficient to upload your data into the Ensembl browser to visualise the genomic regions of interest: http://www.ensembl.org/info/website/upload/index.html

or to use BioMart to retrieve the list of genes in the genomic regions of interest: http://www.ensembl.org/biomart/martview/8c4102e5d689e604e174715a45c6f340

Best wishes

Ben Ensembl Helpdesk

ADD COMMENT
0
Entering edit mode

Thanks Ben for a quick reply. Hope to see a better performance of VEP in such cases. Meanwhile, I would go with your suggestions.

Best, Imran

ADD REPLY

Login before adding your answer.

Traffic: 2536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6