Entering edit mode
5.5 years ago
kamel
▴
70
Dear colleagues of Biostar,
Do you think that the protein sequences that share 40% identity 60% coverage and E-value of 0.00001 are orthologous sequences ?? do you have any recommendations ?? I want to study the pan-proteome.
That's not enough to decide. You need to know if your putative orthologues share any likely ancestry. It could be convergent evolution.
Could you explain to me more.
Similarity cutoffs are insufficient when dealing with protein sequences. There are known orthologues with as little as 10 or 15% similarity, so defining these cutoffs only gets you so far. It might be OK for starters, but if you want to decide whether 2 genes are orthologues, you need to see how likely it is that your 2 sequences diverged from a common ancestor.
Some other options include looking at the local neighbourhood do see if the genes share locale or whether one has simply jumped in at some point.
Just for your sentence "you need to see how likely it is that your 2 sequences diverged from a common ancestor" I am working on 25 proteomes and each proteome contains at least 9000 proteins. From a methodological point of view, I need what as a method (tool) to obtain significant results. I use proteinortho with the values I already mentioned (regarding identity, coverage and e-value)
What 25 proteomes? From what species?
This is a problematic sentence. You don’t go looking for significance. You do the experiment and then interpret the results, then there either is or isn’t significant data.
You need to decide how stringent to be based on the data you have - there is no magical cutoff. Would you rather introduce more false positives or false negatives? What is the actual question?
What is the organism your working on?
There are some tools which you can use for ortholog prediction like, panoct, OrthoMCL, BPGA etc.
I am working on haploid eukaryotic (Fungi), I use the tool "Proteinortho". But I need to define the parameter values to use.
Side note: The title should be a brief description of the question, not the question itself. In your case, a good title would be "Criteria for protein sequences to be orthologous". On the plus side, you've avoided what most people that have the question in their title do - you have not excluded the question from the body of the post, and I am glad about that. But for the future, remember the suggestion about framing a good post title.
Further reading: Rule 5 in this paper
I rectify the title. Thank you.
While there is a good chance that they are, to be absolutely sure you would need to do careful examination of the alignments. Verify that critical domains are present as are binding/active sites etc. Final proof would only be possible by doing some experimental work (knock-out/knock-in) to prove that they are functionally equivalent.