How To Programatically Obtain More Information On A Gene By Using Its Gene Symbol
2
0
Entering edit mode
11.0 years ago
win ▴ 970

hi all, hope someone can help.

i have a list of gene symbols and i want to more info about that gene such as chromosome number locus coordinates alternative symbols etc

and i want to do this programatically.

is there a commonly used flat file that i can query for this?

thanks in advance.

annotation • 3.0k views
ADD COMMENT
3
Entering edit mode

The answer to almost all questions of this type: "I have gene information A and require B, C, D..." is either (1) Biomart or (2) the UCSC genome/table browser. Please search this site for examples.

ADD REPLY
4
Entering edit mode
11.0 years ago
munch ▴ 310

If you want to do this with R, you can use biomaRt. http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html

# install biomart
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")

library(biomaRt)
# define biomart object
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# read in the file
genes <- read.csv("myfile.csv")
# query biomart
results <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol", values = genes$hgnc, mart = mart)

Here is a list of attributes which you can add attributes in getBM: http://biocozy.blogspot.de/2010/08/ensembl-biomart-attributes_09.html

To make sure that you are using the newest dataset (71), you can add the host to useMart() like this:

mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="www.ensembl.org", path="/biomart/martservice")

If you want to use perl, see this post: Using the biomart perl api for simple queries

ADD COMMENT
0
Entering edit mode

hi, i am trying to do as you suggest but with no luck. i have the following in my csv file ERCC2 XPC LRRN6A ERBB2 ESR1 ACADS SLC24A5 MATP HCRTR2 MTCYB MTCYB MTTG FHL1 LOXL1 LOXL1 EXT2 TH MTCYB

and i get an error from R as follows Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol", : Values argument contains no data.

if i type in genes i can view the list of genes

ADD REPLY
1
Entering edit mode

It works. You can try to pass your genes through as.character()

library(biomaRt)
genes <- c("ERCC2", "XPC", "LRRN6A", "ERBB2", "ESR1", "ACADS", "SLC24A5", "MATP", "HCRTR2", "MTCYB", "MTCYB", "MTTG", "FHL1", "LOXL1", "LOXL1", "EXT2", "TH", "MTCYB")
# define biomart object
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# query biomart
results <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol", values = genes, mart = mart)

> results
   ensembl_gene_id hgnc_symbol
1  ENSG00000151348        EXT2
2  ENSG00000180176          TH
3  ENSG00000137252      HCRTR2
4  ENSG00000122971       ACADS
5  ENSG00000022267        FHL1
6  ENSG00000104884       ERCC2
7  ENSG00000188467     SLC24A5
8  ENSG00000141736       ERBB2
9  ENSG00000091831        ESR1
10 ENSG00000129038       LOXL1
11 ENSG00000154767         XPC
ADD REPLY
1
Entering edit mode
11.0 years ago
Bert Overduin ★ 3.7k

I'd also like to draw your attention to the Ensembl REST API that is currently under development, as this gives you the possibility to query the Ensembl databases using any programming language of your choice.

Any feedback / wishes with regard to the REST API are welcome at helpdesk@ensembl.org.

ADD COMMENT

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6