Hi everyone, I have gotten many responses from the site even though I never asked a question, this is my first query. I am working with ensemble vep to annotate and filter a vcf file. With this script I can annotate and filter the variants:
vep --species homo_sapiens --assembly GRCh37 --offline --xref_refseq --failed 1 --check_existing --no_escape --filter_common --dir $ HOME / .vep --fasta $ HOME / .vep / homo_sapiens / 102_GRCh37 / Homo_sapiens .GRCh37.dna.toplevel.fa.gz --vcf --input_file AML1.vcf -o AML1_vep.vcf
With this script, around 40 variants with population frequencies greater than 1% are eliminated, however there are 7 variants with frequencies greater than 1% that are not removed. I can see that these variants have no data in the AF column which is where the filter acted (filter_common). I tried with --filter "AFR_AF <0.01 or not AFR_AF" and removing --filter_common but got the same result; I got the same variants as with filter_common. Could someone tell me what I'm doing wrong? Thank you very much!
Hi Ben, Thank you very much for your answer!
When I add --freq_pop to the vep script, I don't get any change.
If I do the vep filter after vep I don't have any changes either
Try different options, with --filter and nothing.
No problem, Gonzalo. When using the --freq_pop filter in the VEP query itself, you need to use the --check_frequency flag as well as specifying the population and value that the frequency filter applies to e.g
--check_frequency --freq_pop 1KG_AFR --freq_freq 0.1
When using the Filter VEP script with a VCF file as the input, by default filter_vep expects to find VEP annotations encoded in the CSQ INFO key. You can use the --vcf_info_field to change the INFO key VEP expects to decode depending on your VCF input file.
Hi Ben,
Thank you very much for your comments, they were very helpful. I was able to filter most of the variants with population frequencies less than 1% with the addition of "--check_frequency --freq_pop 1KG_AMR --freq_freq 0.01", however a varinate was not removed (yellow color in the image). Change the frequency in --freq_freq 0.01 to 0.001 and add "--freq_gt_lt gt --freq_filter exclude" I could see that two variants with frequencies less than 0.001 were removed (green in the image ) but not this one. I can't figure out what the problem is.
This variant in question was removed using Filter vep as you suggested.
Best wishes
Hi Gonzalo,
No problem- very happy to help. I'm not sure why this variant is not being filtered using the flags you describe. Could you share the ID of the variant in question?
In any case, I'm glad that you were able to use the Filter VEP to achieve the filtering you needed.
Hi Ben, thanks for your reply.
Here I copy part of the row with the information.
The reference genome is GRCh37.
Best,
Please use
ADD REPLY
when responding to existing comments/posts to keep threads logically organized.Hi Gonzalo,
The variant you are trying to filter out has two alt alleles - one alleles has 1KG_AMR = 0.1427 while the other has no frequency from the 1000 Genomes project. This may be the reason for the behaviour of the filter flags.
Could you please share your complete VEP input, query and output for this variant?
Hi Ben,
thank you for the reply. You mean vcf files? Yes, of course.
In the link you will find the input.vcf file, the output_vep.vcf and also the output_vep_vcf.maf generated with the vcf2maf scripts.
vcf files
Best wishes
Hi Gonzalo,
Thanks for providing the input data. In our hands, using the filtering flags in the VEP query itself also produces the same behaviour that you have observed. We will look into this, although we're not sure what is causing this to happen.
As I've said previously (and as you've done), we always advise people to use the Filter VEP script for filtering. I believe the Filter VEP script has performed as expected for you.
Hi Ben,
Thank you very much for your valuable response and for your time. It is good to know that it is not a bug in my script. As you say with Filter Vep it is possible to filter and customize the analysis, but I was interested in having the statistics that come out of Ensembl Vep (for example, how many variants were removed), that's why my interest in doing it that way.
I have also tried using Ensembl Vep with the result coming out of the Filter Vep; in this way it is possible to obtain the statistics, predictions and graphs provided by Vep. It's great!
Best wishes,
Hi Gonzalo,
Just to update you that we'll include a fix in Ensembl 106, due to be released in 2022, to correct the behaviour you observed when using the filters in the VEP query.
Great! thank you for the comment. Best wishes