Question

Extracting divergence times from 4DTv

0

Entering edit mode

5.4 years ago

Matteo Schiavinato ★ 3.6k

Hello everyone,

I have just finished generating a 4DTv plot (example) for paralog and ortholog genes in a project I'm working on. While reading some papers where they also have made such a plot (listed in the introduction here), I see that many times they estimate divergence times between species using 4DTv sites. However, I haven't yet found any of these papers that says clearly what is done to compute divergence times from 4DTv.

4DTv plots show the ratio of transversions at fourfold degenerate sites in a set of pairwise alignments. As they show a ratio between the number of transversions and the total number of variants, this measure is used as a relative time measure to date back genome hybridization / duplication events.

Is there a way I could convert this to an absolute time measure? More in particular, I am interested in finding a way to convert the ratio of transversions into millions of years. What would I need to do that? What other variables should I have to make such calculation?

I am currently reading literature and books about time estimation models, but since that could take forever, I thought I'd might ask here as well :)

variants divergence time 4DTv substitution • 2.2k views

ADD COMMENT • link updated 4.6 years ago by Biostar 20 • written 5.4 years ago by Matteo Schiavinato ★ 3.6k

score 1 · Accepted Answer · 2019-01-21

I'm answering myself, for future readers:

The 4DTv ratio cannot be converted into millions of years, as it is a relative measure of time and therefore can't be converted into an absolute one.

However, one can use the rate of substitution at neutrally evolving sites to determine age. Basically:

parse a pairwise alignment in codons
select only codons belonging to the fourfold-degenerate group
extract third positions of each and count them (tot. positions)
extract the number of positions which differ between the two alignments (substitutions)
compute substitutions / tot. positions to get substitution rate
compare it with a known substitution rate per position per generation time (in years)

The resulting number should be an approximation of how many years have gone by. Be careful, because this assumes a constant mutation rate and can only be used when under the assumption that no differential mutatation rates have been present among species tested (i.e. almost never).

If you can't assume a constant mutation rate per generation, then you can still get a very rough picture of the divergence time in millions of years, knowing that it is imprecise.