How to avoid a phylogenetic bias testing a hypothesis on a big set of bacterial genomes
1
0
Entering edit mode
8.1 years ago
khorms ▴ 230

I want to check a hypothesis that two features of bacteria correlate with each other. I have a big set of bacteria for which I know both of these features. However, different taxonomic groups are presented unequally in my set; it means that any simple statistical test would be biased. Is there any effective way to avoid the phylogenetic bias?

bacteria phylogenetics unifrac genome • 2.5k views
ADD COMMENT
0
Entering edit mode
8.1 years ago
abascalfederico ★ 1.2k

You should not test correlation without considering the underlying phylogeny - the observed characters are not independent but historically related. For this kind of analysis you need something like the program BayesTraits (either Discrete or Continuous depending on the type of character being tested).

If you analyse the correlation within a phylogenetic context, taxa sampling biases should not affect that much. However, you can try to manually remove some taxa to make lineages more equally represented, and then test whether the results change or not. Or you can remove some taxa based on sequence identity (using cd-hit or jalview, for example).

HTH

ADD COMMENT
0
Entering edit mode

Dear Abascal, thank you very much for your answer! BayesTraits looks like a very good idea but it analyses ancestral states so it seems to be not applicable to analyze evolution of bacteria just because of big evolutionary distances between them. What do you mean saying "analyse the correlation within a phylogenetic context"?

The idea of manually removing of some genomes is good, but there is one problem: it is unclear what the level of taxonomy should I consider. For example, if I choose only 1 genome per order, my set still could contain 7 orders belonging to Proteobacteria philum and 13 orders of Firmicutes, so at the phylum level my set will be biased too... So my set will be biased in the different ways depending of taxonomic level I consider...

ADD REPLY
0
Entering edit mode

BayesTraits should work fine with your data, I don't think large evolutionary distances are an issue here. "Analyse the correlation within a phylogenetic context" means that; your data are not independent samples, they are already "correlated" by common descend, that's why you have to test correlated evolution using a phylogenetic tree. You can read this: https://en.wikipedia.org/wiki/Phylogenetic_comparative_methods

ADD REPLY

Login before adding your answer.

Traffic: 2521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6