Question

Statistical significance in phylogenies

0

Entering edit mode

7.2 years ago

ceruleanivy ▴ 50

I have constructed a distance matrix in order to produce a phylogenetic tree for 10 species in R package 'phangorn' and I would like to know how can I calculate p-values for significantly different species based purely on phylogenetic data. I would appreciate some insight, especially by the ones who have tried anything similar with 'phytools'.

R genome next-gen • 2.3k views

ADD COMMENT • link updated 7.2 years ago by Manvendra Singh ★ 2.2k • written 7.2 years ago by ceruleanivy ▴ 50

1

Entering edit mode

Can you clarify exactly what you mean by "calculate p-values for significantly different species?" Are you looking to calculate some P-value associated with the distance between two species or are you looking for statistical support for species being clustered together in the tree? If the latter then Manvendra's answer below is the correct one. Bootstrap values aren't a p-value but they are a support value for a given internal node in the phylogenetic tree. If however, you are looking to calculate a statistical support value for two species being separated from one another that's a different matter entirely and would involve constructing one or more phylogenetic trees for an alternative hypothesis and doing tests on those trees. Like for instance the Approximately Unbiased test.

ADD REPLY • link 7.2 years ago by DG 7.3k

0

Entering edit mode

I think you got it correct, I would like to receive some sort of metric to help me quantify the relationship between two species in the context of statistical significance.

ADD REPLY • link 7.2 years ago by ceruleanivy ▴ 50

0

Entering edit mode

Well, I offered two different things you might be trying to do. And they are two very different things. Similarly, with your response to Manvendra's answer, I'm still not clear exactly what you want to do. It seems like you really want to calculate p-values for all species pairs in your tree. Keep in mind that a phylogenetic tree gives you a lot of information that is dependent on one another. In the simplest case, you have two bits of information, one of which is the topology and the other is the branch lengths. The measure of relatedness between any two species in a tree that is typically used is simply the sum of branch lengths between two species, which gives you an evolutionary distance metric.

I think we still need more detail about the question you are asking and trying to answer with these P-values. P-values fall out of specific tests done to answer specific questions. You need a good idea of what your null hypothesis actually is that you are testing against.

ADD REPLY • link 7.2 years ago by DG 7.3k

score 1 · Answer 1 · 2017-02-17

1

Entering edit mode

7.2 years ago

Manvendra Singh ★ 2.2k

you can try Bootstrapping, you can choose number of bootstraps, with this you can observe how many resampling would give the similar tree you are expecting. consequently , you can calculate p-values too

there is nice package in R that does the needful

its called pvclust its here

ADD COMMENT • link 7.2 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Thanks, do you know which function will give the me the lowest possible p value between two species ? For example in a population of 10 that will eventually lead to a 10x10 matrix of p values.

ADD REPLY • link 7.2 years ago by ceruleanivy ▴ 50