Question

Best method for orthology prediction of more than two protein datasets

0

Entering edit mode

5.8 years ago

ariannapbartlett • 0

Hello all,

Looking for any suggestions on the currently accepted methodology for isolating orthologous proteins from multiple datasets. We are working with eukaryotes who are non-model organisms. Our datasets are in proteins assembled using transdecoder and we have done our best to eliminate redundant sequences. I am somewhat familiar with Hamstr, Orthofinder, OrthoDB, etc. but am not super confident as to which method would be best. Our goal is to rule out paralogous genes and construct a phylogenetic tree. We then want to explore certain genes of interest that are shared between the different species. Any links to good reviews would also be appreciated.

Best,

A.B.

genome rna-seq • 1.1k views

ADD COMMENT • link updated 5.8 years ago by Jean-Karim Heriche 27k • written 5.8 years ago by ariannapbartlett • 0

0

Entering edit mode

Hi,

Can you describe what you done to eliminate redundant sequences? how did you obtain your proteins, from genome or transcriptome? If you have proteins from genome and transcriptome derived, I can suggest that you can first get orthologs of genome-derived protein data set, and later you can use those orthologs to find in transcriptome-derived proteins. If you use both genome and transcriptome-derived proteins together in orthologs analysis, you may not get enough number (>50) orthologs proteins (if you have more than 10 species data).

In addition to tools you mentioned you can use OMA tool, but OMA requires much storage area and takes longer than other tools.

ADD REPLY • link 5.8 years ago by Mehmet ▴ 820

score 0 · Answer 1 · 2018-06-20

Check out the TreeFam papers. The project isn't active anymore but the pipeline is now part of Ensembl Compara and the code still available. You could either run the pipeline again with your sequences or use the Ensembl compara HMMs to identify families for your proteins and add them to the corresponding trees.
By definition, you can only identify paralogues if you build a phylogenetic tree.