Question

Working Principle of Ancestral Sequence Reconstruction (ASR)

0

Entering edit mode

5.5 years ago

johnnytam100 ▴ 110

Hi, I am now trying to work on the ancestral sequence reconstruction of a protein of interest. I am trying to understand the working principle and have looked into related topics such as parsimony and maximum likelihood etc. to understand how do people deduce the ancestral node (i.e. the likely ancestral aa residue). Consider we have an aligned residue in a sequence like:

. . . A . . .
. . . A . . .
. . . A . . .
. . . A . . .
. . . A . . .
. . . G . . .

I guess the best model (maximum likelihood) to describe the alignment with the assumption that evolution event is rare (parsimony) would be the ancestral node is "A".

Then I have a dumb question, what is the difference between just compare the % of the aa resiue and pick the aa residue with highest % and the actual procedure of deducing ancestral node with the intense computation with maximum likelihood method?

Thanks a lot!

ancestral seqeunce reconstruction • 1.1k views

ADD COMMENT • link updated 5.5 years ago by Brice Sarver ★ 3.8k • written 5.5 years ago by johnnytam100 ▴ 110

score 1 · Answer 1 · 2018-10-20

You're working with an alignment here, i.e., an inference of homology. As a result, there will be some underlying phylogenetic structure that you can infer which describes the relationships among the different lineages. You can also specify a model of sequence evolution that will inform your analysis (e.g., the G > A change could be very unlikely in your data for a variety of reasons).

Briefly, when calculating the single-site likelihood, you are mathematically incorporating not just the character states at the tips but also the (possible) character states at the nodes. This the heart of the question you're asking. So, under parsimony, you're correct that the simplest estimation of character state(s) for a node that will give rise to the observed character states will be A with a shift to G somewhere down the line. ML-based approaches take this a step further; you can think of it roughly as 'weighting' the possible states at the node, effectively capturing something about the evolutionary process. Imagine a slightly more complex scenario in an alignment where you have 6 As and 4 Gs. The maximum likelihood estimate of the ancestral state at this node will be a function of your input data and parameters related to the evolutionary process. You may be much more confident that the likely ancestral state is an A than the parsimony calculation which, in its simplest form, will either return an A or {A, G}, depending on how you're considering it. Having likelihood-based ancestral state estimates captures this for less obvious situations.