How to retrieve the genes associated to a VCF file?
4
1
Entering edit mode
5.2 years ago
Davide Chicco ▴ 120

Hi all

I've been starting to work with VCF (Variant Call Format) files recently, and I am not very familiar with them. The task that I have to solve right now is to retrieve all the genes associated to each VCF file. How can I do it?

To read the VCF files, I used the vcR R function:

library("vcfR")
fileName <- "CODE.gatk.snp.indel.vcf"
vcf <- read.vcfR(fileName)
print(head(vcf))
print(colnames(vcf))

Other than that, I don't know how to proceed. Can someone suggest me how to move forward?

Should I convert this file to a bed file and then intersect it through bedtools intersect with the human genome reference file?

Something else?

vcf genes R • 2.8k views
ADD COMMENT
2
Entering edit mode

If we want to stay within R, then there is a bioconductor package ensemblVEP.

ADD REPLY
3
Entering edit mode
5.2 years ago

It looks like you are looking for the gene annotation of your variants. It might be possible to do that in R, but that's probably not optimal. More suitable would be tools like SnpEff, VEP and Annovar, but those are not R based.

ADD COMMENT
0
Entering edit mode

Thanks Wouter for your reply. Sure, I can use other tools. Can you provide me the instructions for the tools you mentioned?

ADD REPLY
1
Entering edit mode

Those tools are well)documented, so you should be able to find instructions online. Let us know if you get stuck.

ADD REPLY
3
Entering edit mode
5.2 years ago
bernatgel ★ 3.4k

If you want to know what genes are affected by the variants in the VCF file what you should do is annotate the VCF. There are many tools for that and wouldn't recommend doing that with your own code.

For example

and many others

ADD COMMENT
3
Entering edit mode
5.2 years ago
paolo002 ▴ 160

Does you vcf file contain SNPs IDs in the form or rs numbers (ex: rs876643) and position? If so you should use the GenomicRanges package from Bioconductor to map the position of the SNPs to the location of the genes and retrieve the gene names.

ADD COMMENT

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6