FILE NAME: CathDomainList.v3.5.0

Question

Parsing Pdbcodes To Their Cath Numbers

2

Entering edit mode

12.3 years ago

Reyhaneh ▴ 530

Hi;

I am looking for a simple file which for every PDBcode-chain (ex. 1e6jP) I can get all the representative CATH numbers for all domains. for example:

PDBcode   CATH  
1e6jP01   1.10.375.10  
1e6jP02   1.10.1200.30

I have looked at the download section of CATH website but was not able to find such a file. Do you have any suggestion?

Thank you;

Reyhaneh

pdb • 3.2k views

ADD COMMENT • link updated 12.3 years ago by Neilfws 49k • written 12.3 years ago by Reyhaneh ▴ 530

score 7 · Answer 1 · 2012-01-20

7

Entering edit mode

12.3 years ago

Simon Cockell 7.4k

EDIT Actually, I realised my first attempt didn't, strictly speaking, answer your question. This should be a bit better.

You want this file: <http://release.cathdb.info/v3.4.0/CathDomainList>

Column 0 gives you the PDB code, columns 1-4 give you the CATH classification down to the homology level. So to parse:

import urllib
import re

def get_pdb_dict():
    """Takes CATH domain list (from URL) and returns dictionary of PDB codes
    & their CATH families"""
    pdbs = {}
    fh = urllib.urlopen('http://release.cathdb.info/v3.4.0/CathDomainList')
    lines = fh.read().split('\n')
    fh.close()
    for line in lines:
        #ignore comments
        if not line.startswith('#'):
            tokens = line.split()
            #lines are space-delimited, so need re.split() here
            tokens = re.split('\s+', line)
            pdb = tokens[0]
            #split the PDB into root identifier and chain id
            pdb_root = pdb[0:5]
            pdb_chain = pdb[5:]
            #could be more/less precise by using more/fewer columns
            cath = '.'.join(tokens[1:5])
            try:
                pdbs[pdb_root].append((pdb_chain,cath))
            except KeyError:
                pdbs[pdb_root] = [(pdb_chain,cath)]
    return pdbs

if __name__ == '__main__':
    p = get_pdb_dict()
    chains = p['1e6jP']
    print chains

[biostar-code/python]$ python parse_cath_domain.py 
[('01', '1.10.375.10'), ('02', '1.10.1200.30')]

ADD COMMENT • link 12.3 years ago by Simon Cockell 7.4k

1

Entering edit mode

Beaten by seconds! I'll just note that the file is about 11.3 MB and has not been updated for sometime (date 2010-11-21). And that "grep 1e6j CathDomainList" gives you a quick view of the entries.

ADD REPLY • link 12.3 years ago by Neilfws 49k

0

Entering edit mode

+1 @neilfws grep would usually be my preferred solution, admittedly. I just fancied writing some code ;)

ADD REPLY • link 12.3 years ago by Simon Cockell 7.4k

0

Entering edit mode

@Simon Cockell Thank you very much. I saw this file but didn't understand the format before. Thanks for the clear explanation.

ADD REPLY • link 12.3 years ago by Reyhaneh ▴ 530

0

Entering edit mode

Here is the more up to date version of the file

http://release.cathdb.info/v3.5.0/CathDomainList

FILE NAME: CathDomainList.v3.5.0

FILE DATE: 21.09.2011

CATH VERSION: v3.5.0

VERSION DATE: 21.09.2011

ADD REPLY • link 12.3 years ago by Reyhaneh ▴ 530

0

Entering edit mode

Here is the more up to date version of the file release.cathdb.info/v3.5.0/CathDomainList

FILE NAME: CathDomainList.v3.5.0

# FILE DATE: 21.09.2011

ADD REPLY • link 12.3 years ago by Reyhaneh ▴ 530

score 1 · Answer 2 · 2012-01-20

If you prefer not to download and parse files, CATH provides a web service which returns XML for a given PDB code. For example: 1E6J.

You could then extract the CATH code using the XML parsing library of your choice. Quick and dirty Ruby example:

#!/usr/bin/ruby
require 'rubygems' # ruby 1.8
require 'mechanize'
require 'crack'

agent = Mechanize.new
page  = agent.get("http://www.cathdb.info/pdb/1e6j?view=xml")
doc   = Crack::XML.parse(page.body)

doms  = doc['document']['cath_pdb_query']['cath_domain'].map {|d|
  [d['domain_id'], d['cath_code']]
}

doms.each {|d|
  puts d.join("\t")
}

# result
1e6jH01 2.60.40.10.5.1.11.1.2
1e6jH02 2.60.40.10.10.1.1.1.139
1e6jL01 2.60.40.10.6.5.2.2.2
1e6jL02 2.60.40.10.3.1.1.1.411
1e6jP01 1.10.375.10.1.1.2.7.1
1e6jP02 1.10.1200.30.2.1.2.3.1