Robinson-Foulds Distance : Number Of Trees At Distance K From A Fixed Tree
2
0
Entering edit mode
10.2 years ago

Hi,

This is for members who might be knowledgeable about the Robinson-Foulds distance. I have a fixed tree T and a bunch of other trees B.

I use the Robinson-Foulds (RF) distance to compare the trees in B with T, and I get a distribution of distances. I'd like to show that this distribution is significantly different from random. In other words, my null hypothesis is that there is no relationship between the trees of B and the RF-distance. In order to contradict this hypothesis, it would help to know how the trees distances from T are distributed if I include all the possible trees.

Hence the question : is it possible to know the number of trees at distance k from a fixed tree T ?

phylogeny tree • 4.1k views
ADD COMMENT
0
Entering edit mode
10.2 years ago
Michael 54k

This problem should be accessible to full enumeration as the number of different tree topologies is finite. You have to impose two further constraints though:

  • The number of nodes (e.g. |B| := |T| for all pairs (B, T), or a range) in the tree
  • The maximum number of child-nodes per node (or unlimited)

Then, you can enumerate all possible trees with these properties (excluding symmetric trees) and compute the RF-distance to T for each. With this you can compute the discrete probability (and therefore a P-value) to get an observation RF(B,T) = k or anything more extreme, provided all trees have the same prior probability of occurring.

This gets more complicated if there should be an arbitrary number of nodes per tree, one might ask how much sense it makes to compare a tree with 1000 nodes with one having 10 nodes. So, I do not know how to solve this but doubt that taking into account different node counts makes sense.

Anyway, the number of trees that can contribute to achieve a distance that is less or equal to k is finite as well. Assume for example, that adding or deleting a node is defined for the metric and has constant cost c := 1 adding to the distance, then you only need to take into account those trees which have N nodes such that |T|-k <= N <= |T| + k (with |T| : number of nodes of tree T).

ADD COMMENT
0
Entering edit mode

By the way, while this solves the original problem, I don't think it has any practical application to phylogentics, because:

  • ignores branch depth
  • "provided all trees have the same prior probability of occurring" which is obviously not the case
ADD REPLY
0
Entering edit mode

Yes thank you. I didn't go into the details but ignoring branch lengths and possible trees should be sufficient for the analysis at hand. I also forgot to mention that all the trees have the same number of nodes. The real question, which I should have made more precise, was "Can we find the number of trees at distance k analytically, purely by mathematical methods and without resorting to generating all non-isomorphic trees ?". But after searching and asking around, I'm getting pretty convinced that this hasn't been solved and is likely to be a tough problem. So I'll probably do as you suggest. Thank you for your answer.

ADD REPLY
0
Entering edit mode

While the number of tree topologies is finite, it becomes intractable for even modest taxon sets. Felsenstein (1978; http://tinyurl.com/zdnsn48) gives the equation for the number of labelled bifurcating rooted trees for n taxa:

(2n-3)! / [2^(n-2) * (n-2)!]

The number of unrooted trees for x taxa can be obtained from the equation above by setting n = x - 1.

For example, for just 10 taxa the number of unrooted trees is 2,027,025 and the number of rooted tree is 34,459,425.

ADD REPLY
0
Entering edit mode
4.9 years ago
leonardini • 0

This paper might be of interest: https://arxiv.org/abs/0810.0868. And we have an much faster, nearly cubic-time, implementation here: https://github.com/WGS-TB/RFDistribution (please contact me if you decide to use it and want to cite it).

ADD COMMENT
0
Entering edit mode

The paper is now published and available here: https://www.sciencedirect.com/science/article/pii/S147692712030133X

ADD REPLY

Login before adding your answer.

Traffic: 3824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6