Dear Biostars Community,
I can't find a way to read my annotated VCF file with VEP from command line, with R. I tried with vcfR and ensemblVEP, but I can't find the right way to deal with the VEP.vcf file.
Thank you in advance, AY
Dear Biostars Community,
I can't find a way to read my annotated VCF file with VEP from command line, with R. I tried with vcfR and ensemblVEP, but I can't find the right way to deal with the VEP.vcf file.
Thank you in advance, AY
This is not pure R solution but it may be ok. Assuming the VEP annotation is the INFO tag CSQ and it uses |
as separator (the default):
bcftools query -f '%CHROM|%POS|%ID|%REF|%ALT|%QUAL|%FILTER|%INFO/CSQ\n' vep.vcf > vep.txt
Now vep.txt
is a |
separated table with columns CHROM, POS, etc. The meaning of the VEP columns is in the VCF header you should be able to extract it with e.g.
bcftools view -h vep.vcf | grep '##INFO=<ID=CSQ,'
To read it in R you can use read.table('vep.txt', sep= '|', ...)
In addition to the above answer, I would like to add that bcftools has a VEP-specific plugin which can also be used. More information can be found in: https://samtools.github.io/bcftools/howtos/plugin.split-vep.html
And of course, the resulted output file can be then easily imported in R.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you dariober, however this didn't function. I can't understand why I found an higher number of "|" in my file than expected by "INFO-ID=CSQ"... Who is this possible?? Make me really confused