Unable to query specific ensembl ID through biomaRt (R) or mygene (Python)
1
1
Entering edit mode
5.2 years ago

When querying for the gene symbol for ensembl ID, I often have unmatch results. That being said, if I search for these IDs on ensembl website (Grch37 build), I can find them. Below is a list of ensembl IDs with the gene symbol, which I could only retrieve via the website.

ENSG00000017373 SRCIN1
ENSG00000090920 FCGBP 
ENSG00000108264 TADA2A 
ENSG00000108272 DHRS11 
ENSG00000108278 ZNHIT3 
ENSG00000121848 RNF115 
ENSG00000135213 POM121C 
ENSG00000154768 C17orf50 
ENSG00000160828 STAG3L2 
ENSG00000163386 NBPF10 
ENSG00000163486 SRGAP2 
ENSG00000165388 ZNF488 
ENSG00000168274 HIST1H2AE 
ENSG00000168614 NBPF9 
ENSG00000170866 LILRA3

In biomaRt, I tried...

library(biomaRt)
gene_list = c('ENSG00000017373', 'ENSG00000090920', 'ENSG00000108264', 'ENSG00000108272')
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

results <- getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),
     filters = "ensembl_gene_id",
     values = gene_list,
     mart = mart)

And in mygene, I tried...

mg = mygene.MyGeneInfo()
ginfo = mg.querymany(gene_list, scopes='ensembl.gene', fields='symbol', species='human')

I have tried messing with mygene scopes and biomaRt's filters. But for whatever reason, I can only obtain this information on ensembl's website.

As I doubt the website would be more comprehensive than the actual databases, I am looking for feedback on where I may be going wrong.

ensembl mygene biomart gene symbols ensembl ID • 2.1k views
ADD COMMENT
3
Entering edit mode
5.2 years ago

It is the version issue. Currently you are querying grch38, where as OP symbols/IDs are from grch37. Change your mart as follows:

> library(biomaRt)
> gene_list = c('ENSG00000017373', 'ENSG00000090920', 'ENSG00000108264', 'ENSG00000108272')

With current ensembl mart

> mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
> getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),
+                  filters = "ensembl_gene_id",
+                  values = gene_list,
+                  mart = mart)
[1] ensembl_gene_id    external_gene_name
<0 rows> (or 0-length row.names)

with grch 37

> mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl",  host="grch37.ensembl.org")
> getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),
+                filters = "ensembl_gene_id",
+                values = gene_list,
+                mart = mart)
  ensembl_gene_id external_gene_name
1 ENSG00000017373             SRCIN1
2 ENSG00000090920              FCGBP
3 ENSG00000108264             TADA2A
4 ENSG00000108272             DHRS11

Ensembl IDs of corresponding gene symbols in current build

> getBM(attributes = c('ensembl_gene_id', 'external_gene_name'),
+       filters = "external_gene_name",
+       values = results$external_gene_name,
+       mart = mart)  %>% 
+     arrange(.,external_gene_name) %>% 
+     group_by(external_gene_name) %>%
+     summarise(external_gene_name_list = paste(ensembl_gene_id, collapse=", "))
# A tibble: 4 x 2
  external_gene_name external_gene_name_list         
  <chr>              <chr>                           
1 DHRS11             ENSG00000278535, ENSG00000275397
2 FCGBP              ENSG00000275395, ENSG00000281123
3 SRCIN1             ENSG00000277363, ENSG00000273608
4 TADA2A             ENSG00000276234, ENSG00000277104
ADD COMMENT

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6