Biostar Beta. Not for public use.
How to retrieve the genes associated to a VCF file?
1
Entering edit mode
14 months ago
Canada

Hi all

I've been starting to work with VCF (Variant Call Format) files recently, and I am not very familiar with them. The task that I have to solve right now is to retrieve all the genes associated to each VCF file. How can I do it?

To read the VCF files, I used the vcR R function:

library("vcfR")
fileName <- "CODE.gatk.snp.indel.vcf"
vcf <- read.vcfR(fileName)
print(head(vcf))
print(colnames(vcf))

Other than that, I don't know how to proceed. Can someone suggest me how to move forward?

Should I convert this file to a bed file and then intersect it through bedtools intersect with the human genome reference file?

Something else?

vcf genes R • 291 views
ADD COMMENTlink
2
Entering edit mode

If we want to stay within R, then there is a bioconductor package ensemblVEP.

ADD REPLYlink
3
Entering edit mode
14 months ago
Belgium

It looks like you are looking for the gene annotation of your variants. It might be possible to do that in R, but that's probably not optimal. More suitable would be tools like SnpEff, VEP and Annovar, but those are not R based.

ADD COMMENTlink
0
Entering edit mode

Thanks Wouter for your reply. Sure, I can use other tools. Can you provide me the instructions for the tools you mentioned?

ADD REPLYlink
1
Entering edit mode

Those tools are well)documented, so you should be able to find instructions online. Let us know if you get stuck.

ADD REPLYlink
3
Entering edit mode
18 months ago
bernatgel ♦ 1.9k
Barcelona, Spain

If you want to know what genes are affected by the variants in the VCF file what you should do is annotate the VCF. There are many tools for that and wouldn't recommend doing that with your own code.

For example

and many others

ADD COMMENTlink
3
Entering edit mode
22 months ago
paolo002 • 140

Does you vcf file contain SNPs IDs in the form or rs numbers (ex: rs876643) and position? If so you should use the GenomicRanges package from Bioconductor to map the position of the SNPs to the location of the genes and retrieve the gene names.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3