Extract species tree from the Ensembl Compara database
1
0
Entering edit mode
5.2 years ago
BlastedBadger ▴ 160

Hi, I have a problem specific to the EnsEMBL Compara database, which is how to retrieve the species tree that was used to run TreeBest in the Compara pipeline. I am basing this question on the version 93.

In short:

Does anyone know how to proceed to extract the EnsEMBL species tree used by TreeBest?

Detailed question:

My problem is that in the description of the protein trees pipeline, it says that:

The species tree is based on the NCBI taxonomy tree (subject to some modifications depending on new datasets).

So I am unsure whether the tree that we can download manually on the species tree page is the one including the modifications. Is it the case?

In any case, I am interested in fetching it automatically. I am more at ease with SQL than object-oriented Perl, but I tried the API:

#!/usr/bin/env perl


use warnings;
use strict;

use Bio::EnsEMBL::Registry;

# Auto-configuration
Bio::EnsEMBL::Registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous',
    -port => 5306);

my $species_tree_adaptor = Bio::EnsEMBL::Registry->get_adaptor(
    'Multi', 'compara', 'SpeciesTree');

#Bio::EnsEMBL::Compara::DBSQL::SpeciesTreeAdaptor

my $species_trees = $species_tree_adaptor->fetch_all();

foreach my $tree (@{$species_trees}) {
    print $tree->toString(), "\n";
}

This code fetches something, but I think it's just some ID and label from the species_tree_root table, but I would like a newick output...

Below, I tried using directly the SpeciesTree class, but the create_species_tree does not work without any argument, and anyway, I don't see any method of this class to write a newick tree. I should maybe use SpeciesTreeNode->newick_format(), but I don't know how to get such an instance...

use Bio::EnsEMBL::Compara::Utils::SpeciesTree;
## include all available species from genome_db by default
my $species_tree = Bio::EnsEMBL::Compara::Utils::SpeciesTree->create_species_tree();
#print $species_tree->newick_format();

Additionnally, I am interested in the $species_tree->ultrametrize_from_timetree() method.

Many thanks.

tree EnsEMBL database • 1.6k views
ADD COMMENT
1
Entering edit mode

It seems clear to me that the tree provided on GitHub is the one used by Ensembl. As far as I know, the modifications of the NCBI tree concern the inclusion of new species.

ADD REPLY
0
Entering edit mode
5.2 years ago
Emily 23k

You can just download it.

ADD COMMENT
0
Entering edit mode

I understand from this page that that one is not the tree used in the protein trees pipeline. Also it is a fully resolved tree, and therefore compared to the NCBI topology makes some choices regarding uncertain branching orders: for example, it differs from the TimeTree topology at those places: ((Boreoeutheria,Xenarthra),Afrotheria) VS (Boreoeutheria,(Xenarthra,Afrotheria)), or ((Glires,Tupaia),Primates) VS (Glires,(Tupaia,Primates)), among other. I am supposing EnsEMBL Compara uses the unresolved polytomic tree from NCBI to reconcile protein trees, and so I wonder, how were the polytomies resolved in the tree you link to?

ADD REPLY

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6