Question

Why Is Statistical Coupling Analysis Not Confounded By Shared Evolutionary History?

5

Entering edit mode

13.5 years ago

Rohan ▴ 50

Statistical Coupling Analysis (SCA) is a method for finding pairs of amino acids that covary in a protein family, based on a multiple sequence alignment. See http://www.hhmi.swmed.edu/Labs/rr/sca.html and http://en.wikipedia.org/wiki/Statistical_coupling_analysis.

What I don't understand, is how this method doesn't simply pull out the phylogenetic tree. Why doesn't shared evolutionary history bias this procedure?

statistics protein evolution alignment • 5.0k views

ADD COMMENT • link updated 7.8 years ago by Biostar 20 • written 13.5 years ago by Rohan ▴ 50

score 2 · Answer 1 · 2010-10-29

It seems to me that covariance is a originally determined by the thermodynamics governing protein/peptide folding and those covariances are propagated through evolutionary descent. The questions being asked of SCA and evolutionary history have some overlap.

SCA may pick up on features missed by evolutionary history. For example, imaging that a pair of mutations in a hemoglobin protein confer resistance to a mammalian parasite. That pair of mutations may have occurred in a common ancestor of all mammals, in which case SCA will produce the same result as the phylogenetic tree. But it may also have arisen later, in multiple species, and propagated from there. Phylogeny may not identify covariance in this sort of feature while SCA might.

Shared evolutionary history will probably bias the answers of SCA, but that bias doesn't necessarily invalidate the results. (Sometimes bias makes it easier to get the right answer. Bias isn't guaranteed to be bad.) If a multiple-feature covariance has propagated through many generations of many species, is it no longer interesting?

score 1 · Answer 2 · 2010-11-23

In the following article, the authors try to derive residue contacts from correlated columns in multiple alignments. They use Mutual Information (+ a direct coupling measure), not statistical coupling. But this can helps you to find an answer.

Martin Weigta, Robert A. Whitea, Hendrik Szurmantc, James A. Hochc, and Terence Hwaa. Identification of direct residue contacts in protein–protein interaction by message passing

I also found some interesting articles about the use of correlations in multiple alignment columns :

Andreas Kowarsch, Angelika Fuchs, Dmitrij Frishman, Philipp Pagel Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions

Fodor AA, Aldrich RW Influence of conservation on calculations of amino acid covariance in multiple sequence alignments.