Question: Extracting divergence times from 4DTv
0
Entering edit mode

Hello everyone,

I have just finished generating a 4DTv plot (example) for paralog and ortholog genes in a project I'm working on. While reading some papers where they also have made such a plot (listed in the introduction here), I see that many times they estimate divergence times between species using 4DTv sites. However, I haven't yet found any of these papers that says clearly what is done to compute divergence times from 4DTv.

4DTv plots show the ratio of transversions at fourfold degenerate sites in a set of pairwise alignments. As they show a ratio between the number of transversions and the total number of variants, this measure is used as a relative time measure to date back genome hybridization / duplication events.

Is there a way I could convert this to an absolute time measure? More in particular, I am interested in finding a way to convert the ratio of transversions into millions of years. What would I need to do that? What other variables should I have to make such calculation?

I am currently reading literature and books about time estimation models, but since that could take forever, I thought I'd might ask here as well :)

1
Entering edit mode

The 4DTv ratio cannot be converted into millions of years, as it is a relative measure of time and therefore can't be converted into an absolute one.

However, one can use the rate of substitution at neutrally evolving sites to determine age. Basically:

• parse a pairwise alignment in codons
• select only codons belonging to the fourfold-degenerate group
• extract third positions of each and count them (tot. positions)
• extract the number of positions which differ between the two alignments (substitutions)
• compute substitutions / tot. positions to get substitution rate
• compare it with a known substitution rate per position per generation time (in years)

The resulting number should be an approximation of how many years have gone by. Be careful, because this assumes a constant mutation rate and can only be used when under the assumption that no differential mutatation rates have been present among species tested (i.e. almost never).

If you can't assume a constant mutation rate per generation, then you can still get a very rough picture of the divergence time in millions of years, knowing that it is imprecise.