Biostar Beta. Not for public use.
Gene Id Conversion Tool
57
Entering edit mode
27 days ago
Renee • 570
@Renee16

Hey,

I was using DAVID (http://david.abcc.ncifcrf.gov/conversion.jsp) to do the gene ID conversion, e.g.conversion between Agilent ID, Genebank accession id and Entrez gene ID, but I found the DAVID database is not updated. Does anyone know a better updataed conversion tool to do this job? Thanks!

mapping conversion • 237k views
ADD COMMENTlink
0
Entering edit mode

How frequently do you need things updated? DAVID does have yearly releases so far, but their latest release is this month (March 2010). See the release announcement here: http://david.abcc.ncifcrf.gov/forum/cgi-bin/ikonboard.cgi?act=ST;f=10;t=25 This does suggest the underlying mapping framework will be updated along with it in the 6.7 beta, and hence should include more recent information for the conversion tool

ADD REPLYlink
0
Entering edit mode

Hi I am faced the same problem.I did differential gene expression by using this protocol Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown I have gene list file after using the ballgown the gene id in this files is as id

MSTRG.28632

MSTRG.3615

MSTRG.7507

MSTRG.70532

MSTRG.49954

MSTRG.60656

MSTRG.34410 I want to perform gene ontology next by using tool AgriGo. these gene ids are not recognized in any database. i have use the tool bioDBnet to convert these ids into ensembl gene id .but not found result.

ADD REPLYlink
0
Entering edit mode

Please do not use the answer field for comments. THe search function will give you answers on how to deal with MSTRG. Please use it.

ADD REPLYlink
49
Entering edit mode
9.8 years ago
@Casey Bergman314

The bioDBnet and Hyperlink Management System (HMS) systems convert multiple ID sets to each other.

HMS is limited to three species (human, mouse ciona) and has fewer data sources (Agilent - no, GenBank and Entrez - yes).

The bioDBnet system appears to be species-neutral and the network of linked databases is shown here, (includes Agilent, GenBank and Entrez, so it should fit your requirements): alt text

ADD COMMENTlink
32
Entering edit mode
6.4 years ago
adam.maikai • 480
@adam.maikai14264

MyGene.info is a web service that provides up to date annotations in several fields and is great for gene ID conversion. All species from NCBI and Ensembl are supported and annotations are updated weekly to ensure the latest annotations are available. Both python and R/Bioconductor clients are easy to use.

http://bioconductor.org/packages/release/bioc/html/mygene.html

https://pypi.python.org/pypi/mygene

MyGene.info may not be able to solve your problem with Agilent IDs but several other IDs from Genebank, Uniprot, Ensembl, Refseq are all available. Also, from either client, you can query several thousand genes at once.

Here is some example syntax for ID conversion from the python module:

>>>import mygene
>>>mg = mygene.MyGeneInfo()
>>>mg.metadata['available_fields'] ## returns available query terms
[u'accession', u'alias', u'biocarta', u'chr', u'end', u'ensemblgene', u'ensemblprotein', u'ensembltranscript', u'entrezgene', u'exons', u'flybase', u'generif', u'go', u'hgnc', u'homologene', u'hprd', u'humancyc', u'interpro', u'ipi', u'kegg', u'mgi', u'mim', u'mirbase', u'mousecyc', u'name', u'netpath', u'pdb', u'pfam', u'pharmgkb', u'pid', u'pir', u'prosite', u'ratmap', u'reactome', u'reagent', u'refseq', u'reporter', u'retired', u'rgd', u'smpdb', u'start', u'strand', u'summary', u'symbol', u'tair', u'taxid', u'type_of_gene', u'unigene', u'uniprot', u'wikipathways', u'wormbase', u'xenbase', u'yeastcyc', u'zfin']

>>>xli = ['DDX26B','CCDC83', 'MAST3', 'RPL11', 'ZDHHC20', 'LUC7L3', 'SNORD49A', 'CTSH', 'ACOT8']
>>>mg.querymany(xli, scopes="symbol", fields=["uniprot", "ensembl.gene", "reporter"], species="human", as_dataframe=True)

A DataFrame is returned:

Finished.
             _id                        ensembl.gene  \
query
DDX26B    203522                     ENSG00000165359
CCDC83    220047                     ENSG00000150676
MAST3      23031                     ENSG00000099308
RPL11       6135                     ENSG00000142676
ZDHHC20   253832                     ENSG00000180776
LUC7L3     51747                     ENSG00000108848
SNORD49A   26800  [ENSG00000277370, ENSG00000175061]
CTSH        1512                     ENSG00000103811
ACOT8      10005                     ENSG00000101473

                                                   reporter  \
query
DDX26B    {u'HG-U95B': u'53886_at', u'GNF1H': u'gnf1h144...
CCDC83    {u'GNF1H': [u'gnf1h06565_at', u'gnf1h09743_at'...
MAST3     {u'HG-U133_Plus_2': u'213045_at', u'HG-U95Av2'...
RPL11     {u'GNF1H': u'200010_at', u'HG-U133_Plus_2': u'...
ZDHHC20   {u'HG-U133_Plus_2': [u'225365_at', u'243786_at']}
LUC7L3    {u'HG-U95B': [u'55032_at', u'57642_at'], u'HG-...
SNORD49A  {u'HG-U133_Plus_2': [u'225065_x_at', u'239754_...
CTSH      {u'HG-U133_Plus_2': u'202295_s_at', u'HG-U95Av...
ACOT8     {u'HG-U95B': u'47789_at', u'HG-U133_Plus_2': [...

                                                    uniprot
query
DDX26B                           {u'Swiss-Prot': u'Q5JSJ4'}
CCDC83     {u'Swiss-Prot': u'Q8IWF9', u'TrEMBL': u'H0YDV3'}
MAST3      {u'Swiss-Prot': u'O60307', u'TrEMBL': u'V9GYV0'}
RPL11     {u'Swiss-Prot': u'P62913', u'TrEMBL': [u'Q5VVC...
ZDHHC20    {u'Swiss-Prot': u'Q5W0Z9', u'TrEMBL': u'B4DRN8'}
LUC7L3    {u'Swiss-Prot': u'O95232', u'TrEMBL': [u'A8K3C...
SNORD49A                                                NaN
CTSH      {u'Swiss-Prot': u'P09668', u'TrEMBL': [u'E9PKT...
ACOT8     {u'Swiss-Prot': u'O14734', u'TrEMBL': [u'E9PIS...

And now for the Bioconductor package:

library(mygene)
xli  <-  c('DDX26B','CCDC83',  'MAST3', 'RPL11', 'ZDHHC20',  'LUC7L3',  'SNORD49A',  'CTSH', 'ACOT8')
queryMany(xli, scopes="symbol", fields=c("uniprot", "ensembl.gene", "reporter"), species="human")

This returns a DataFrame:

Finished
DataFrame with 9 rows and 5 columns
                     ensembl.gene         _id uniprot.Swiss-Prot uniprot.TrEMBL       query
                  <CharacterList> <character>        <character>         <List> <character>
1                 ENSG00000165359      203522             Q5JSJ4       ########      DDX26B
2                 ENSG00000150676      220047             Q8IWF9       ########      CCDC83
3                 ENSG00000099308       23031             O60307       ########       MAST3
4                 ENSG00000142676        6135             P62913       ########       RPL11
5                 ENSG00000180776      253832             Q5W0Z9       ########     ZDHHC20
6                 ENSG00000108848       51747             O95232       ########      LUC7L3
7 ENSG00000277370,ENSG00000175061       26800                 NA       ########    SNORD49A
8                 ENSG00000103811        1512             P09668       ########        CTSH
9                 ENSG00000101473       10005             O14734       ########       ACOT8
ADD COMMENTlink
4
Entering edit mode

That's a pretty neat service. You should post this as a separate tool annonucement.

ADD REPLYlink
1
Entering edit mode

There is already a request for including Agilent reporter IDs in MyGene.info:

https://bitbucket.org/sulab/mygene.info/issue/1/support-for-agilent-platform-reporters

Please leave a comment there if someone need any specific platforms to be included.

ADD REPLYlink
20
Entering edit mode
11.0 years ago
@Michael Dondrup55

BioMart has already been mentioned. It can do much more than ID conversion but it is very useful for conversion purposes, it is regularly updated and you can select different genome builds and all kinds of genomic features. It seems to me that you wish to retrieve GeneIDs linked to Affymetrix IDs. To select these attributes in BioMart: go to the Martview page to start a new BioMart query.

Select attributes on the attribute page: The Ensembl GeneIDs and Transcript IDs are default. Ensembl GeneID and Affy IDs are under the "External" tab. Select your chip there. To limit to those genes which are on the chip, use the Filters->Gene menue. You can limit the genes to those present on various platforms or your favourite set.

There is an URL button in biomart that allows to retrieve a URL for your query and to pass it on to others. Try this example:

BioMart URL URL, that should be a good starting point.

If you are interested in KEGG identifiers (Pathways, Genes), EC-numbers, etc. the

KEGG Identifier page could be handy, because the KEGG ids are not in BioMart as far as I know.

ADD COMMENTlink
0
Entering edit mode

thank you Michael, this is so helpful for me

ADD REPLYlink
8
Entering edit mode
11.0 years ago
Perry • 290
@Perry92

BridgeDB provides a nice API and REST interface, so you can put ID mapping queries in your scripts.

ADD COMMENTlink
0
Entering edit mode

BridgeDB is really a software framework that you can use in our own code; either directly (currently only in Java) or through calling it as a webservice. It can use different and even multiple stacked mappings. By default these come from ENSEMBL (for gene products) and HMDB (for metabolites). Ongoing projects extend the available mappings with ChemSPider and SNP info. There is a short introduction available at Nature Precedings: http://precedings.nature.com/documents/5023/version/1 and a paper in BMC Bioinformatics: http://dx.doi.org/10.1186/1471-2105-11-5

ADD REPLYlink
7
Entering edit mode
11.1 years ago
@Giovanni M Dall'Olio23

You can also do it with the following services:

  • uniprot - Click on 'Id Mapping' from the home page.
  • biomart - choose a database and a version, then put the ids you want to convert under Filters->Id List limit (select the proper input id in the menu), and then the output ids under 'Attributes'. Biomart is a general tool that enables you to extract a lot of different informations from databases - sequences, ontologies, transcripts, homologues - but maybe for converting gene ids is a bit too complex.
  • galaxy - I can't help too much about this here but I am sure it has a function for doing that - and many other things.
ADD COMMENTlink
7
Entering edit mode
11.0 years ago
@Madelaine Gogol74

If you have just a few, I just saw someone use the R package BioIDMapper and it seemed kind of neat. But it's slow.

ADD COMMENTlink
0
Entering edit mode

Unfortunately, this link is now broken :/ ...

ADD REPLYlink
1
Entering edit mode

There is a more recent version at: http://sourceforge.net/projects/bioidmapper/

ADD REPLYlink
5
Entering edit mode
8.4 years ago
@Mohammed Islaih19

The following link has a list of ID conversion tools:

http://hum-molgen.org/NewsGen/08-2009/000020.html

ADD COMMENTlink
4
Entering edit mode
11.0 years ago
@Daniel Swan59

http://idconverter.bioinfo.cnio.es/

Is another possible solution to this, although you might find this is not as up to date as you might like either.

ADD COMMENTlink
1
Entering edit mode

This application does not work

ADD REPLYlink
0
Entering edit mode

Ah well - 6.4 years for an online bioinformatics application isn't the worst lifespan..

ADD REPLYlink
0
Entering edit mode

Thanks. I knew about that. :) My intention was to make people skip the post without clicking link.

ADD REPLYlink
0
Entering edit mode

I would like to ask here that this tool also converts HGNC id to ENSEMBLE ID (ENSG..) But for all the HGNC ID I have I do not get the correspoding ENSEMBLE ID, is there anyway I can retrieve the maximum id of ENSEMBLE for my HGNC gene id's?

ADD REPLYlink
3
Entering edit mode
5.3 years ago
grvpanchal • 30
@grvpanchal21328

Try out: http://mygene.info/v2/api#MyGene.info-gene-query-service

Its Awesome!!!

ADD COMMENTlink
2
Entering edit mode
4.6 years ago
Samuel Lampa ♦ 1.2k
@Samuel Lampa1296

Have a look at the (BETA stage) Ensembl REST API

For example, for converting from Ensembl Gene ids to Gene symbols, you could use a query like this one:

http://beta.rest.ensembl.org/xrefs/id/ENSG00000059804?content-type=application/json

... and then programmatically (some python parsing should be rather straight forward) extract the "display_id" for the items that have "dbname" = "HGNC", or "EntrezGene".

For example, the following PHP code did the trick for me:

Test the new Ensemble REST API with an example gene $ensemblID = "ENSG00000157764"; $url = "http://beta.rest.ensembl.org/xrefs/id/$ensemblID?content-type=application/json"; $ensemblResultJson = file_get_contents($url); $ensemblResult = json_decode($ensemblResultJson, true); # Print out each found Gene symbol on a separate row: echo "
    "; foreach ($ensemblResult as $mapping) { if ( in_array( $mapping['dbname'], array("EntrezGene","HGNC"))) { echo "
  • Found Gene symbol: " . $mapping['display_id'] . "
  • \n"; } } echo "
";

ADD COMMENTlink
2
Entering edit mode
4.2 years ago
Shicheng Guo ♦ 7.5k
@Shicheng Guo19400

The most easy way is as the follow:

http://www.genenames.org/cgi-bin/download

ADD COMMENTlink
1
Entering edit mode
11.4 years ago
@ialbert

I don't know of a direct solution myself, but this is a topic that may be of interest for the biological data analysis class that I am teaching.

If you specify the organism/genomic builds that you are interested in we may be able to generate a full translation list as an in class example or a homework. I was planning on covering an Affymetrix ID to Genebank example anyhow.

ADD COMMENTlink
0
Entering edit mode

Thanks! That's great! But I'm not student there...Can I access to that anyway? I am using Human whole genome Agilent array. Thank you so much.

ADD REPLYlink
0
Entering edit mode

missed this comment, sorry about it!

ADD REPLYlink
1
Entering edit mode
7.2 years ago
aheinzel • 110
@aheinzel9622

Not sure what your background is, however, we recently started to develop an id mapper / converter for experimentalists who prefer organizing their data in Excel. Therefore, the client directly integrates into MS Excel.

Currently, we provide the possibility to map from various IDs to ensembl and back. The mapping data were extracted from Ensembl 73 (released on the 4.9.2013). If you need mappings for any additional ID types availalble from the ensembl database we will be happy to add them (please just tell us via our feedback form).

ADD COMMENTlink
0
Entering edit mode

@[aheinzel](https://www.biostars.org/u/9622/)

Is it possible to use this tool to generate ENSEMBLE (ENSG ID) from HGNC gene ID for human? Also does it work on Mac or is it just for Windows?

ADD REPLYlink
0
Entering edit mode

I don't really understand the need for this. Many identity mappers offer webservices and if needed these can be installed locally. That is definitely true for our own BridgeDb. Is there any reason you could not just call these services from Excel? (And yes that would allow mapping from ENSEML gene ID to HGNC or from probeset IDs)

ADD REPLYlink
1
Entering edit mode
5.7 years ago
s.birch • 10
@s.birch19031

If you have just a few, I just saw someone use the R package BioIDMapper and it seemed kind of a good thing to use. But it'

is slow

ADD COMMENTlink
0
Entering edit mode
ADD COMMENTlink
0
Entering edit mode
27 days ago
Mike • 0
@mike-52898

HELLO TEST TEXT

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3