Mapping NDC drug codes to their corresponding PubChem ids
1
1
Entering edit mode
8.9 years ago
aarond7511 ▴ 10

Hello,

Would anyone possibly know of an algorithm that can map NDC drug codes to their corresponding PubChem ids or an intermediate identifier? Trying to avoid the use of text mining algorithms as much as possible.

pubchem NDC • 3.0k views
ADD COMMENT
0
Entering edit mode

I am curious as to what you perceive as the utility for this particular mapping?

ADD REPLY
0
Entering edit mode
8.9 years ago
wdiwdi ▴ 380

This is quite straightforward using the Cactvs Cheminformatics toolkit (visit www.xemistry.com/academic for free academic downloads), though it does involve not-quite-foolproof Internet-based compound name resolution:

a) Download the FDA NDC database zip file and expand. It contains a file 'product.txt' with the relevant data.

b) Use one of the toolkit interpreters to either run a simple Tcl script

table dictloop [table read product.txt] row {
    set ndc [dict get $row PRODUCTNDC]
    puts -nonewline "$ndc\t"
    set d [dict create]
    foreach s [split [dict get $row SUBSTANCENAME] \;] {
        set s [string trim $s]
        if {[info exists resolved($s)]} {
            dict append d $s $resolved($s)
        } elseif {[info exists unresolved($s)] || [catch {ens create $s} eh]} {
            puts stderr "failed to resolve substance name $s"
            set unresolved($s) 1        
        } else {
            if {[catch {ens get $eh E_CID} cid]} {
                puts stderr "no PubChem CID for substance $s"
                set unresolved($s) 1
            } else {
                dict append d $s $cid
                set resolved($s) $cid
            }
            ens delete $eh
        }
    }
    puts $d
}

c) or a Python3 script

t=Table.Read('product.txt')
t.iteratorstyle = 'dict';
resolved={}
unresolved={}
for row in t:
    ndc = row['PRODUCTNDC']   
    print(ndc,'\t',end='')
    d={}
    for s in [w.strip() for w in row['SUBSTANCENAME'].split(';')]:
        if s in resolved:
            d[s] = resolved[s]          
        elif s in unresolved:
            print('failed to resolve substance name',s,file=sys.stderr)
            unresolved[s] = True
        else:
            try:
                e=Ens(s)
                try:
                    d[s] = resolved[s] = e.E_CID
                except:   
                    print('no PubChem CID for',s,file=sys.stderr)
                    unresolved[s] = True
                finally:
                    e.delete()
            except:
                print('failed to resolve substance name',s,file=sys.stderr)
                unresolved[s] = True
    print(d)
ADD COMMENT
0
Entering edit mode

Hi

Thanks for the information.But it is not mention, what is Ens in above python script.

ADD REPLY

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6