Gene Id to Ensembl ID Conversion with LONG list
3
0
Entering edit mode
5.9 years ago

Hello everyone,

Is there an available resource that converts long lists of gene names to Ensembl IDs?

I CANNOT use Biomart, because the advised limit is 500 genes, and I have several lists of >6000 gene names each, and I cannot use DAVID because there is no input option that allows for regular gene names.

Thanks in advance!

Sincerely, Virlana.

gene • 4.4k views
ADD COMMENT
2
Entering edit mode
5.9 years ago

If you have the mygene library installed in Python, you could use the following Python script:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

genes = []
for line in sys.stdin:
    genes.append(line.strip())

for gene in genes:
    result = mg.query(gene, scopes="symbol", fields=["ensembl"], species="human", verbose=False)
    hgnc_name = gene
    for hit in result["hits"]:
        if "ensembl" in hit and "gene" in hit["ensembl"]:
            sys.stdout.write("%s\t%s\n" % (hgnc_name, hit["ensembl"]["gene"]))

If you don't have mygene installed and you want to install it, you could run the following:

$ pip install mygene

As an example, here are HGNC names of genes in a file called "hgnc.txt":

DDX26B
CCDC83
MAST3
RPL11
ZDHHC20
LUC7L3
SNORD49A
CTSH
ACOT8

The above script would give the following output:

$ ./map_hgnc_to_ensg.py < hgnc.txt
DDX26B  ENSG00000225235
DDX26B  ENSG00000165359
CCDC83  ENSG00000150676
MAST3   ENSG00000099308
RPL11   ENSG00000142676
ZDHHC20 ENSG00000180776
ZDHHC20 ENSG00000236953
LUC7L3  ENSG00000108848
SNORD49A        ENSG00000277370
CTSH    ENSG00000103811
ACOT8   ENSG00000101473

You could write the output to a text file like so:

$ ./map_hgnc_to_ensg.py < hgnc.txt > hgnc_mapped_to_ensg.txt
ADD COMMENT
0
Entering edit mode

Note there is not a 1-to-1 correspondence between HGNC and Ensembl IDs. See the following post from Emily_Ensembl for discussion: Why am I getting different ensembl gene ids for a given gene symbol?

ADD REPLY
2
Entering edit mode
5.9 years ago
Denise CS ★ 5.2k

There are other ways to use BioMart beyond its web user interface: BiomaRt, Bioconductor R package, BioMart Perl API and BioMart RESTful access.

ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

Asker has noted that they cannot use Biomart.

ADD REPLY
0
Entering edit mode

well, in that link there are around 4 ways in addition to link to most comprehensive post on ID conversion https://www.biostars.org/p/22/:

  1. python
  2. R - 3 different ways (using mygene, ensembldb, pathview)
  3. User developed tool
  4. Biomart.

I guess even if biomart is excluded, there are still 5 ways left including 3 methods from R.

ADD REPLY

Login before adding your answer.

Traffic: 2089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6