Get gene names from rs SNP ids
4
6
Entering edit mode
9.6 years ago

I've got a list of rs SNPs I'd like to enter into the David functional annotation tool. Since it does not support rsids I need to get the refseq gene id (or something similar) of the genes these SNPs overlap with first.

Is there a simple way of getting a gene ID for a SNP? Solution in BioPython or R is fine.

Since I have 22k snps I need an automatic way. And if a text file that maps these values exist, a link to it would be enough.

Ps. preferably looking for a solution that does not use the position of the SNPs; I would be able to solve the problem this way myself.

SNP rs gene • 27k views
ADD COMMENT
2
Entering edit mode

I find this tool SNP-Nexus very effective. It runs with the rs IDs of the SNPs as well with batch queries. Please find the link to the page below

http://snp-nexus.org/about.html

I hope this tool will suffice your need a well

ADD REPLY
11
Entering edit mode
9.6 years ago
Neilfws 49k

It looks like DAVID can convert from Ensembl Gene IDs, so you can get to those from rs IDs using R/biomaRt, like this:

library(biomaRt)
mart.snp <- useMart("snp", "hsapiens_snp")

getENSG <- function(rs = "rs3043732", mart = mart.snp) {
  results <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"),
                   filters    = "snp_filter", values = rs, mart = mart)
  return(results)
}

# default parameters
getENSG()
  refsnp_id ensembl_gene_stable_id
1 rs3043732        ENSG00000175445

# or supply rs ID
getENSG(rs = "rs224550")
  refsnp_id ensembl_gene_stable_id
1  rs224550        ENSG00000262304
2  rs224550        ENSG00000196689
ADD COMMENT
0
Entering edit mode

Is there any way to get NCBI Gene ID and/or HGNC id for each SNP id?

ADD REPLY
0
Entering edit mode

listAttributes(mart.snp) will show what can be returned; listFilters(mart.snp) will show what can be queried.

ADD REPLY
0
Entering edit mode

Hi, using biomaRt is there a way to set GRCh37 instead of default GRCh38 version? thank you in advance.

ADD REPLY
0
Entering edit mode

You can specify an Ensembl archive using the host = parameter of useMart. For example:

mart.hs <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "grch37.ensembl.org")
ADD REPLY
5
Entering edit mode
9.6 years ago
Emily 23k

Depending on what you want to know, you might also put them into the Ensembl VEP. You can use a list of rsIDs as input, and as output get all the genes they hit and what effects they have, including SIFT and Polyphen scores where relevant. You can use it as an online tool or run it as a Perl script (no perling on your part).

ADD COMMENT
0
Entering edit mode

Teensy perl code example appreciated.

ADD REPLY
0
Entering edit mode

Just go to the page I linked to and download. The whole script is written for you and there's full documentation on how to run it. You don't have to write anything.

ADD REPLY
2
Entering edit mode
2.6 years ago

As an alternative to Biomart you can use Enterz API. An example query to fetch the rs6311 would use the following URL:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=rs6311&retmode=xml

when retrieving many results at a time you may wish to use a script with proper identification to follow Entrez usage policy. I am using my own little Python wrapper for that, easy_entrez:

from easy_entrez import EntrezAPI

entrez_api = EntrezAPI('Project name', 'your@mail.com')
rs6311 = entrez_api.fetch(['rs6311'], max_results=1, database='snp').data
namespaces = {'ns0': 'https://www.ncbi.nlm.nih.gov/SNP/docsum'}
genes = [
    name.text
    for name in rs6311.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
print(genes)

which gives ['HTR2A'].

ADD COMMENT
1
Entering edit mode
9.6 years ago
Rm 8.3k

Gene to rs id

library(biomaRt)

## It might take long time to process if many genes (>50)  in the list.

## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line.
genes <- read.table("~/hgnc_gene_symbols.txt")

ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl")
dbsnp = useMart("snp", dataset = "hsapiens_snp")

getHGNC2ENSG = getBM(attributes=c('chromosome_name', 'start_position',
                                  'end_position', 'strand', 'ensembl_gene_id',
                                  'hgnc_symbol', 'refseq_mrna'),
                     filters ='hgnc_symbol', values = genes, mart = ensembl)

write.table(getHGNC2ENSG, file="~/hgnc_gene_symbols.txt.ensg.coord.tsv",
            sep="\t", col.names=T, row.names=T, append = F, quote=FALSE)

getRSid4ENSG <- getBM(c('refsnp_id', 'allele', 'snp', 'chr_name', 'chrom_start',
                        'chrom_strand', 'associated_gene', 'ensembl_gene_stable_id', 
                        'synonym_name', 'consequence_type_tv'), 
                      filters = 'ensembl_gene',  values = genes, mart = dbsnp)

write.table(getRSid4ENSG, file="~/hgnc_gene_symbols.txt.ensg.RSid.coord.tsv", 
            sep="\t", col.names=T, row.names=T, append = F, quote=FALSE)
ADD COMMENT

Login before adding your answer.

Traffic: 1430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6