Most efficient strategy to convert from Ensembl protein IDs (ENSP) to Entrez Gene Symbols?
3
1
Entering edit mode
5.4 years ago
sam237337 ▴ 70

I have a list of Ensembl protein IDs (ENSP) that I need to convert to Entrez-formatted gene symbols. So far, I haven't identified a straightforward method to convert between these two formats, as I'm not seeing a platform that will permit this. This is my current tentative strategy:

Step 1: Convert ENSP protein IDs to HGNC gene symbols via the R package EnsDb.Hsapiens.v86

Step 2: Convert HGNC gene symbols to UniProtKB format via the UniProt Protein Conversion tool ( https://www.uniprot.org/uploadlists/ ). For some reason, UniProtKB is the only format that is available when converting from HGNC format.

Step 3: Convert UniProt KB protein IDs to Entrez Gene ID Numbers via the UniProt Protein Conversion tool ( https://www.uniprot.org/uploadlists/ ); this platform offers conversion to Entrez gene ID numbers, but not Entrez gene symbols...

Step 4: Convert Entrez Gene ID Numbers to Entrez Gene Symbols via the R package org.Hs.eg.db, with reference to this thread: Gene symbol convert to Entrez ID

Strategies reviewed:

I reviewed the biomaRt platform, but am not seeing relevant ID conversion tools (e.g. going to http://biomart.org/ --> Tools --> ID Conversion takes me to a general notice that the community portal is unavailable).

Referencing this related thread: Make List Of All Human Gene Ids (Ens, Hgnc, Entrez) To Ease Conversion Of Ids

...The International Protein Index (IPI) platform provides an ipi.HUMAN.xrefs file: ftp://ftp.ebi.ac.uk/pub/databases/IPI/last_release/current/

...with the initial content:

Protein cross-references file for IPI human release 3.87

SP A0A183 IPI00807623 ENSP00000411070; VALIDATED:NP_001122072; HIT000394684; ABJ55982; 31824,LCE6A; 448835,LCE6A; UPI0000D83229 Hs.62927; CCDS44227.1; GI:190610047; OTTHUMP00000210240;

However, the columns in this file aren't labeled, and I don't know the format of each column, or whether Entrez-format gene symbols are present. The ReadMe file does not provide this information.

With reference to this thread: Gene Id Conversion Tool

...I tried to use DAVID: http://david.abcc.ncifcrf.gov/conversion.jsp ... but it doesn't seem to recognize ENSP-formatted inputs, as testing examples generates error messages.

bioDBnet ( https://biodbnet-abcc.ncifcrf.gov/db/db2db.php ) doesn't permit ENSP conversion to Entrez format.

The biodb.jp Hyperlink Management System ( http://biodb.jp/ ) has tools related to Ensemble Protein IDs, but I don't see tools for converting to Entrez format.

If there is any way to simplify the intended 4-step processing strategy described above, I will appreciate any suggestions. Thanks in advance for your input.

conversion • 4.2k views
ADD COMMENT
3
Entering edit mode

Why not just download the mapping from Biomart? That'd be a single step and vastly simpler.

ADD REPLY
2
Entering edit mode
5.4 years ago
Emily 23k

It's very easy using BioMart. You just filter by the list of ENSPs and get the NCBI gene IDs as output (note, they used to be called Entrez Gene IDs, they're now called NCBI gene IDs).

ADD COMMENT
0
Entering edit mode

Thanks for your clarification, Emily; I will plan to try using BioMart again, beyond my initial experience with that platform that hadn't been successful.

I think that it is frowned upon in forum etiquette to reply to each individual response with a thank-you message, so I will let this response serve as my thank-you to all who responded; I will be testing these various strategies in the near future, and will post an update once I determine what works.

ADD REPLY
1
Entering edit mode
5.4 years ago

Your best bet is to convert ENSP -> Entrez ID using StringDB (https://string-db.org). You may dload all the associations from here https://string-db.org/mapping_files/entrez_mappings/entrez_gene_id.vs.string.v10.28042015.tsv

From EntrezID, it is simple to convert to GeneName or any other ID.

ADD COMMENT
0
Entering edit mode

Is this possible in R? I can only find the "map" function which converts from Entrez ID > ENSP, but nothing in package that does the reverse.

ADD REPLY
1
Entering edit mode
5.4 years ago
vkkodali_ncbi ★ 3.7k

You can use the file gene2ensembl.gz from this NCBI FTP path. You want 9606 in the first column (tax_id for human) and the second column is the Entrez GeneID with the ENSP accession (where applicable) in the last column.

ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6