Question

Renumber Pdb Files To Match Actual Sequence

1

Entering edit mode

11.8 years ago

Whetting ★ 1.6k

Hi,
I am working on a project aimed at compiling papillomavirus sequence information. I will gladly share the link if people are interested, but I do not want to spam. Anyway, as part of the effort we want to show alignments between pdb structure files and HPV sequences.
We noticed that several PDB files were not numbered according to the actual genome. E.g. assume the C-terminal domain of protein x was crystallized, the numbering should be residue 250 to residue 500, however, the crystallographer numbered the PDB file according to the peptide crystallized. Does anyone have any suggestions for a program that may be able to accomplish the renumbering? Thanks!

EDIT: I think I may have found a solution.
I think I can write a tool pdbsws using and a perl file I found here: http://www.canoz.com/sdh/renumberpdbchain.pl

pdb sequence • 6.6k views

ADD COMMENT • link updated 11.8 years ago by Vladimir Chupakhin ▴ 520 • written 11.8 years ago by Whetting ★ 1.6k

score 2 · Answer 1 · 2012-07-10

2

Entering edit mode

11.8 years ago

Vladimir Chupakhin ▴ 520

Sometimes PDB numbering is quite a mess. I used protein alignment but it's useless in term of full PDB database. Take a look at the service pdbsws

ADD COMMENT • link 11.8 years ago by Vladimir Chupakhin ▴ 520

0

Entering edit mode

That's pretty cool, wish I had known about that one earlier!

ADD REPLY • link 11.8 years ago by Will 4.5k

score 1 · Answer 2 · 2012-07-10

1

Entering edit mode

11.8 years ago

Will 4.5k

I've come across the same problem. My method has been to align (using a local alignment) the PDB sequences with the relevant protein sequences and determine the proper numbering from there. I wrote a simple Matlab script to do the re-numbering but any language should work just as well.

Also, don't forget to account for gaps in the PDB sequences. I've found many instances where the crystal structure is missing parts in the middle.

ADD COMMENT • link 11.8 years ago by Will 4.5k

0

Entering edit mode

Hi Will, the problem I ran into was that it seemed impossible to completely renumber the entire pdb file. I.e. helices, sheets,...have to be renumbered as well. Did you write a script that updated all those lines, or is that not necessary to parse the pdb file?

ADD REPLY • link 11.8 years ago by Whetting ★ 1.6k

0

Entering edit mode

Essentially I just use the script to write out the position (X,Y,Z), chain, original-index, and full-protein-index of each AA to a separate file. Then I just used those for my downstream analysis ... I didn't try to write anything back into the PDB file.

ADD REPLY • link 11.8 years ago by Will 4.5k