Question

How Do You Pick A Pdb Nmr Conformation For Multivariate Analysis?

6

Entering edit mode

13.0 years ago

Deena ▴ 280

Hello,
I was wondering regarding performing multivariate analysis on structures of proteins (ex: based on C-alpha coordinates) using a multivariate approach such as principal component analysis (PCA). For protein models which are derived from solution NMR, and for which multiple conformers are deposited (for the same PDB structure),

- how to you pick the best conformer to base your analysis? And how do you justify picking one conformer and not that others
- Or do you use all possible conformers for that protein?

The underlying question would be: - how much does inter-conformer variability contribute to a multivariate analysis of protein structures (ex: PCA?)

Any advice and suggested reading material on the subject is highly appreciated! Thank you, Deena

pca pdb protein structure • 5.3k views

ADD COMMENT • link updated 13.0 years ago by Tiago • 0 • written 13.0 years ago by Deena ▴ 280

0

Entering edit mode

The advantage of PCA plots is that you immediately see how big the difference between conformations compared to different proteins is. Why not just give it a try with all that you have got?

ADD REPLY • link 13.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

But how are you planning to apply PCA to protein structures?

ADD REPLY • link 13.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Interesting idea... So you would take the estimate of the relative coordinates from the center of the protein for every atom (or the center of an amino acid?) for every conformation estimate and see whether specific variation (your principal components) occur between those conformations? Is that indeed what you plan to do>? That should show you for instance groups of conformations that share the same major conformation change (different curve) separated from the rest immediately.I have never seen anything like that, but I think the idea is nice. I would love to see the results.

ADD REPLY • link 13.0 years ago by Chris Evelo 10k

0

Entering edit mode

Interesting idea... So you would take the estimate of the relative coordinates from the center of the protein for every atom (or the center of an amino acid?) for every conformation estimate and see whether specific variation (your principal components) occur between those conformations? Is that indeed what you plan to do? That should show you for instance groups of conformations that share the same major conformation change (different curve) separated from the rest immediately.I have never seen anything like that, but I think the idea is nice. I would love to see the results.

ADD REPLY • link 13.0 years ago by Chris Evelo 10k

0

Entering edit mode

Interesting idea... So you would take the estimate of the relative coordinates from the center of the protein for every atom (or the center of an amino acid?) for every conformation estimate and see whether specific variation (your principal components) occur between those conformations? Is that indeed what you plan to do? That should show you for instance groups of conformations that share the same major conformation change (different curve) separated from the rest immediately.

ADD REPLY • link 13.0 years ago by Chris Evelo 10k

score 2 · Answer 1 · 2011-04-08

2

Entering edit mode

13.0 years ago

Chris Evelo 10k

Did you see [?]this paper[?]?

Principal components analysis of protein structure ensembles calculated using NMR data. Howe PW. J Biomol NMR. 2001 May;20(1):61-70.

ADD COMMENT • link 13.0 years ago by Chris Evelo 10k

0

Entering edit mode

I'm not entirely sure what Deena is trying to do. The article applies PCA on multiple conformations of one protein which seems to be reasonable. However, comparing different proteins comes with many issues (e.g. unequal sequence length and order), so I don't think it would work.

ADD REPLY • link 13.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

I'm not entirely sure what Deena is trying to do. The article applies PCA on multiple conformations of one protein which seems to be reasonable. However, comparing different proteins comes with many issues (e.g. unequal sequence length and order), so I don't think it would work in that case.

ADD REPLY • link 13.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Just in any case this comes handy(Bio3D R package) http://mccammon.ucsd.edu/~bgrant/bio3d/html/pca.xyz.html

ADD REPLY • link 13.0 years ago by Woa ★ 2.9k

score 2 · Answer 2 · 2011-04-08

I am a former NMR spectroscopist. You should check the comments on the PDB file. If the authors don't call out a specific model as being the best representative of the family, then you can usually assume that it is the first one. The authors have access to the underlying constraint data that you do not have. In the lab I was trained in, we picked based on the model that best satisfied all of the constraints (i.e., lowest energy in the final minimization step) unless something really odd happened- and if something really odd happened, you'd definitely research it and put something about it in the comments when you submitted to the PDB. But other labs did different things. I don't know if the field has standardized on how to pick a model since I left it.

Regardless, the differences between models are usually pretty small. If you're down to the level of detail at which these differences matter, you should consider using all of the models, since they all satisfy the experimental constraints.

EDITED TO ADD: I missed the additional question in the follow up. Back in my day, we used custom software to find out how many structures were needed to adequately represent the structural space that is consistent with the experimental constraints. I don't know if there are publicly available tools to do this now, but XPLOR is really more of a crystallography tool, and they don't have this exact concern, so I'd be surprised if it does what you want.

Anyway, I think you need to decide whether the differences among the structures in the family matter or not for the question you are trying to answer. In most cases, just picking the first structure in the family will be sufficient. In cases where it isn't, then you probably need to find a way to include the entire family.

I dug up the relevant paragraph from one of my (ancient) papers to help you think about this (and to remind myself of what we did): "A final ensemble of 24 structures was selected by first ordering the structures on the basis of increasing restraint violation energies. Structures that had a total AMBER energy or a specific term of the force field greater than two standard deviations above the mean were carefully scrutinized for potential exclusion from the final ensemble. The minimum number of structures required to adequately represent the conformational space allowed by the data was 22, as determined using the FINDFAM program (Smith 1999).The number of structures in the final ensemble was selected to be similar to that used for previous calbindin D9k structures to facilitate comparison." (This is the paper: [?]http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2373453/?tool=pubmed[?]

If you want to be really thorough, the paper describing the structure you want to use should have a similar paragraph in the materials and methods, and that will help you figure out what to do.

score 0 · Answer 3 · 2011-04-08

Thanks Michael and Chris for your responses and advice! Originally I want to perform a PCA on a homologous protein family, involving structures from both X-ray and NMR..which is why I need to first pick out which NMR structure model is the best representative for each protein that has NMR ensembles.

I think Chris has the idea..mainly applying PCA on an NMR ensemble for each protein, with the hope to detect the variation within that ensemble. I have seen papers that use the NMR conformer closest to the average of the ensembles as the final model - any advice on techniques/tools to do this? (I will be checking XPLOR and OLDERADO to see what they do!)

score 0 · Answer 4 · 2011-04-08

I always use the first model for Structure analysis using NMR models. AFAIK, the first model seems to be the best model.

Here is couple of ideas that you could try:

Perform a quality check on individual NMR chains using Ramachandran Plot, WHATIF or HARMONY* and select the best chain based on the quality assesment
See the coverage of the structure with respect to the SEQRES / original sequence record, this will help you to identify the best model
I have used CRANKITE for similar analysis before, take a look at the package here

Disclaimer: I am an author of HARMONY

score 0 · Answer 5 · 2011-04-19

Thank you Khader and Melanie for your responses. I have taken your advice and I am looking at the publications of the protein structures that I am analyzing. Some publications offer an explanation of why multiple conformers exists, and some also provide a representative NMR model. Many of the selected conformers are highlighted by the authors' as best on lowest energy, or models with the least violations. However, because a large number of the structures lack a proper 'explanation' as to why a specific conformer is better than the rest, I will probably use only a "representative" conformer based on OLDERADO and the suggestions of other programs that Khader has mentioned. Thank you both for your help. Best wishes, Deena

score 0 · Answer 6 · 2011-06-15

0

Entering edit mode

12.9 years ago

Tiago • 0

Have you tried theseus (http://www.theseus3d.org/)? Gives an average structure, i think, dont know if it will help.

Tiago

ADD COMMENT • link 12.9 years ago by Tiago • 0