Question

Converting between drug identifier formats

0

Entering edit mode

6.5 years ago

EverInEarnest ▴ 40

I have two CSV files of drug-related data. One has the drug info specified with CHEMBL identifiers, whereas the second file contains DrugBank and PubChem IDs. I need to compare these two files for overlap in their drug contents. Both files contain drug names in string format, but working with those is tricky, since often a single row/drug will contain several synonyms, and accurately matching between the two files seems like it will be challenging, especially since both files are unlikely to contain the same synonyms for a particular drug.

I'm looking for a simple way (e.g. an existing function or website) that will allow me to convert between my CHEMBL IDs in the first file, and my DrugBank & PubChem IDs in the second file. I have performed a fairly extensive search, but am surprised that I'm not finding e.g. an R or Python function, or a web-based tool, that would allow me to do this. [This site is similar to what I need, with lots of options for the "From" format, but unfortunately, no useful options for the "To" format: http://cts.fiehnlab.ucdavis.edu/conversion/batch ]. I also located this Jupyter Notebook (http://nbviewer.jupyter.org/url/git.dhimmel.com/drugbank/unichem-map.ipynb) to match DrugBank compounds to external resources using UniChem, but for my purposes, this Notebook seems far too complex for the simple conversion I'm seeking.

Any suggestions about resources that might assist with this drug ID conversion task will be much appreciated. Thanks!!

conversion database drug • 8.1k views

ADD COMMENT • link updated 3.9 years ago by hsiaoyi0504 ▴ 70 • written 6.5 years ago by EverInEarnest ▴ 40

score 3 · Answer 1 · 2017-10-28

3

Entering edit mode

6.5 years ago

Zhilong Jia ★ 2.2k

Convert the PubChem IDs to CHEMBL IDs (In the Output IDs section, choose Registry IDs - CHCMBL.) via https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi

ADD COMMENT • link 6.5 years ago by Zhilong Jia ★ 2.2k

1

Entering edit mode

Many thanks, Zhilong! That is exactly what I needed!

ADD REPLY • link 6.5 years ago by EverInEarnest ▴ 40

1

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 161k

score 1 · Answer 2 · 2017-10-28

This is easily done with the Cactvs Cheminformatics Toolkit (visit www.xemistry.com/academic for free academic packages, it includes both a loadable Python module and a stand-alone Python interpreter with chemistry extensions). The toolkit can decode the three IDs you are using (and many more) into structure objects, and the fastest way to compare these is by computing a structure hashcode. There is no name/synonym matching involved - this purely works on structural connectivity

Here some interactive commands in the Python version, comparing Aspirin via its different DB IDs, and also directly computing the database IDs for structures from a different source:

cspy
pycactvs>e1=Ens('CID:2244')
pycactvs>e2=Ens('CHEMBL:25')
pycactvs>e3=Ens('DRUGBANK:DB00945')
pycactvs>e1.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e2.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e3.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e1.E_CHEMBL_ID
'CHEMBL:25'
pycactvs>e1.E_DRUGBANK_ID
'DB00945'
pycactvs>e2.E_CID
2244
pycactvs>e1.E_SMILES
'CC(=O)OC1=CC=CC=C1C(=O)O'

There is a chemistry-aware table object which helps you with the processing of table data files. I'd be surprised if this required more than 10 lines of script code.

score 0 · Answer 3 · 2020-06-09

0

Entering edit mode

3.9 years ago

hsiaoyi0504 ▴ 70

Alternatively, use id mapping provided by unichem https://www.ebi.ac.uk/unichem/. More than 50 databases are processed to provide a full source mapping.

ADD COMMENT • link 3.9 years ago by hsiaoyi0504 ▴ 70