Biostar Beta. Not for public use.
Question: Phylogenetic Analysis Of Whole Genomes
8
Entering edit mode

hi can anyone tell me the name of the software for performing the alignment and constructing the phylogenetic tree of whole genome. thanks in advance.

ADD COMMENTlink 9.6 years ago Aparna • 120 • updated 9 months ago rm.umayal24 • 0
Entering edit mode
2

I think you need to elaborate on what exactly you are trying to accomplish. Are you trying to make a species tree or gene trees? How many genomes are you starting from? Are they prokaryotic or eukaryotic genomes?

ADD REPLYlink 9.6 years ago
Lars Juhl Jensen
11k
Entering edit mode
0

i need species tree containing 24 species all belonging to prokaryotic genomes

ADD REPLYlink 9.6 years ago
Aparna
• 120
15
Entering edit mode

I am not aware of an easy way to construct reliable species trees based on complete genomes. The general approach that you need to take is to pick one or more genes based on which to base your phylogeny. This could be either 16S rRNA, all ribosomal-protein-coding genes, or other highly conserved genes that are universally present and rarely subject to gene duplications or lateral gene transfer.

Once you have picked the genes, you need to make a multiple sequence alignment(s). You need to do this for each of the genes that you want to use for your phylogeny. For this I would tend to use either muscle or mafft. After that I would use Gblocks to extract the conserved blocks in the alignment(s) in order to not use potentially misaligned parts as the basis for tree building.

If you decided to use multiple genes as the basis for your phylogeny, you now have to make a big decision, namely whether to go for a concatenated alignment approach or a supertree approach. In the first case, you would concatenate all of the multiple alignments and use the resulting big alignment as input for a phylogenetic tree reconstruction program, for example PhyML. In the second case, you would use such a program to make a separate tree for each of the genes of interest, and subsequently use one of several supertree programs to derive a consensus tree based on these. If you went for just using a single gene as the basis for your tree, you obviously just build a tree for that one gene and you are done.

I hope this helps, although it is certainly very far from a "push of a button" solution.

ADD COMMENTlink 9.6 years ago Lars Juhl Jensen 11k • updated 17 months ago RamRS 21k
Entering edit mode
1

Depends a bit on what you want to do, but as long as the 24 genomes are not too far apart, I agree that 16S rRNA is a good choice. If one wants to attempt to resolve very deep-branching parts of the tree, I believe you need a multi-locus approach to get enough information to be able to do much. But in that case using just 24 genomes would be unlikely to work anyway.

ADD REPLYlink 9.6 years ago
Lars Juhl Jensen
11k
Entering edit mode
1

Just don't use any of the alignment software suggested; try something with a "profile"-based alignment or something geared to rRNA.

ADD REPLYlink 9.6 years ago
Paulo Nuin
♦ 3.7k
Entering edit mode
0

I wold recommend ssu-align for 16S multiple sequence alignment. It uses a 16S HMM.

ADD REPLYlink 2.0 years ago
Eli Korvigo
• 150
Entering edit mode
0

+1 for 16S rRNA instead of whole genome

ADD REPLYlink 9.6 years ago
Michael Schubert
♦ 6.9k
Entering edit mode
0

+1 and agree on 16S, all other genes will lead to a sort of 'non-standard' approach.

ADD REPLYlink 9.6 years ago
Michael Dondrup
46k
Entering edit mode
0

@Paulo, good point. I completely agree that if you want to do rRNA alignment you should use dedicated, profile-based tools. The alignment tools were meant as suggestions for how to make multiple alignments of protein-coding genes.

ADD REPLYlink 9.6 years ago
Lars Juhl Jensen
11k
Entering edit mode
0

I would try to use the "fasttree" program, it gives comparable results to PhyML but is much faster, which would be beneficial on a genome wide scale. Anyway, if you use multiple loci of whole genomes for phylogeny reconstruction, there would be only a very tiny difference between different programs. Anyway, if you have whole genome sequences available, do not just rely on 16S rRNAs but take as much as sequence data as possible into account..

ADD REPLYlink 9.1 years ago
Peter
• 90
Entering edit mode
0

Could you give a recommendation for a "supertree" program? I have trees built from genotypes from individual chromosomes and I want to generate a consensus tree.

ADD REPLYlink 8.7 years ago
User 3875
• 50
5
Entering edit mode

RaxML: I'm not sure but I think this program works for entire genomes and is supposed to be very fast:

Results: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values.

ADD COMMENTlink 9.6 years ago Science_Robot ♦ 1.1k • updated 17 months ago RamRS 21k
Entering edit mode
1

Like PhyML and MrBayes, RAxML takes a multiple sequence alignment as input and uses maximum-likelihood to infer an evolutionary tree. It is thus not a tool that you can just give a bunch of genomes and produce a trees; you'd have to first make, for example, a 16S rRNA alignment or a concatenated ribosomal protein alignment.

ADD REPLYlink 9.6 years ago
Lars Juhl Jensen
11k
Entering edit mode
0

MrBayes is not an ML method. It's based on Bayesian inference.

ADD REPLYlink 2.0 years ago
Eli Korvigo
• 150
4
Entering edit mode

Genome-scale multiple sequence alignments are not quite good for phylogenies: they take a lot of time to compute and are never accurate. Moreover, it's hard to imagine a general-purpose sequence evolution model that would be equally adequate for protein-coding, rRNA, tRNA genes, repeats and other regions. Picking a subset of genes manually is not a nice option either, because you will lose a lot of phylogenetic resolution. I would thus recommend building a tree based on all orthologous genes, which is the most common thing to do as far as I can tell. Here is a general pipeline:

  1. Annotate your genomes using Prokka (for prokaryotes) or another tool;
  2. Find one-to-one protein-coding orthologs using OrthoFinder or OrthoMCL;
  3. Run multiple sequence alignments (MSAs) for each group (any MSA tools will do, but I prefer mafft);
  4. Filter each MSA using Gblocks;
  5. Merge filtered alignments (I use Python for that, but I'm pretty sure there are some tools that don't require programming skills);
  6. Use raxml (maximum likelihood) or beast (bayesian inference) to infer the phylogeny.
ADD COMMENTlink 2.0 years ago Eli Korvigo • 150
2
Entering edit mode

Hi, the approach at MicrobesOnline looks interesting. If the 24 species genomes are public and high quality their phylogenetic positions may already be there for you (click on "Species Tree"). If they are unpublished genomes they also allow you to host data privately- although I am only assuming that you would then be able to add them to the existing data sets, I don't know for sure.

The trees are made from 78 protein coding loci, so not "whole genomes" but the difference is probably trivial for most species.

ADD COMMENTlink 9.6 years ago Dave Lunt ♦ 2.0k • updated 17 months ago RamRS 21k
2
Entering edit mode

Alignment of whole genomes is a quite delicate task and a pain to parse a lot of different output formats until a measure of distance/similarity emerges. Good aligners are MUMMER and MAUVE. I really like MAUVE, used it to play with a lot of genomes from different strains of E. coli. That's the advantage of whole genome comparision! You can find "species" tree even when 16S says that the distance is zero.

For the phylogeny part of the work, you can use RaxML as said by some folks here. For high number of taxa this guy is the fastest one on the road. In your case a more precise approach is feasible. So, you can use ERATE which is Sean Eddy's version of DNAML from Phylip. It can deal with indels and I recommend it even in the 16S case.

But, if you really don't wanna suffer, just check the Genome-To-Genome Distance Calculator service and choose your own setup. After getting the distances, just use Clearcut to generate a NJ tree. Fast and cheap! Not very accurate if you work with very divergent species.

ADD COMMENTlink 9.1 years ago Jarretinha 3.3k • updated 17 months ago RamRS 21k
1
Entering edit mode

Which program would be a more modern and better alternative to Phylip PARS for clustering 0/1 data representing presence/absence of genes amongst multiple strains of bacteria?

ADD COMMENTlink 9.1 years ago Adam Witney • 10
Entering edit mode
1

Adam: don't open new questions inside another discussion. Open a new thread instead, otherwise nobody will be able to answer you.

ADD REPLYlink 9.1 years ago
Giovanni M Dall'Olio
26k
Entering edit mode
0

I was actually following on from Dave Lunt's comment that said there are better alternatives to Phylip now, but maybe I put the question in the wrong place (should have been a comment on his comment). Thanks

ADD REPLYlink 9.1 years ago
Adam Witney
• 10
1
Entering edit mode

It's a very old post but I thought I could add to it to help others who might want to do a similar analysis i.e. create phylogenies from whole genomes for prokaryotic species. I have created a basic analysis pipeline that tries to simplify the process of creating phylogenetic trees at species level using only the conserved (otherwise known as the core) genomic content of all the 'bacterial' species. The steps used are described and the script is available at http://mcgp.sourceforge.net/

ADD COMMENTlink 5.2 years ago Chrispin Chaguza • 230
Entering edit mode
0

Hi. Why don't you put it on GitHub ?

ADD REPLYlink 5.2 years ago
geek_y
9.7k
Entering edit mode
0

At first glance, it appears that your pipeline is what community microbiologists/metagenomics people do as a day-to-day part of a standard analysis. How does yours differ from established pipelines/workflows in the currently published literature?

(Also, to respond to the other comment: It's on SourceForge as an SVN repository.)

ADD REPLYlink 5.2 years ago
Brice Sarver
♦ 2.6k
Entering edit mode
0

sounds good such a nice tool. So if i align 100 genome using Mauve and generate a whole genome alignment tree and on other hand if i use your tool how much it will be differ, what do you think ???

ADD REPLYlink 5.1 years ago
HG
♦ 1.1k
1
Entering edit mode

This tool may work.

https://realphy.unibas.ch/fcgi/realphy

ADD COMMENTlink 2.0 years ago ofanoyi • 110
0
Entering edit mode

The VCF2PopTree software would be helpful if you are constructing a phylogenetic tree from VCF or SNP file. It reads even the human genome. It is so cool and it does not need any dependencies.

The software link is as follows: http://sankarsubramanian.net/dat/index.html

ADD COMMENTlink 9 months ago rm.umayal24 • 0

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0