One to one ortholog detection by RBH blast (Reciprocal best hit)
1
0
Entering edit mode
5.7 years ago
madapada • 0

Hello! I'm trying to define one to one ortholog gene sets from various mammals' CDS data from NCBI RefSeq. My goal is to get an ortholog matrix only with one to one ortholog genes, which would be aligned and analyzed by some evolutionary programs to find interesting clade-specific genetic variants. I don't want to include one to many or many to many relationships because they can make some problems in further steps such as dNdS.

Unfortunately, several of my species are not included in Inparanoid, OrthoMCL or other ortholog-finding program's database as far as I have searched. Most of the orthology finding programs treat only precomputed data from the species in their databases. So I think there are no options but to run reciprocal best hit (RBH) blast. I'm going to select only the hits with a reciprocally exclusive relationship. Which means:


"Result of 1st (forward) blast"

Species A (query) Species B (database)

gene A1 gene B1

gene A2 gene B2, gene B3

gene A3 gene B3

gene A4 gene B4, B5


Result of 2nd (reverse) blast

Species B (query) Species A (database)

gene B1 gene A1

gene B2 gene A2

gene B3 gene A2, A3

gene B4 gene A4, A5


In this case, only "A1-B1" would be in my result matrix because: - Gene A2 matches with gene B2 and gene B3 (one to many) - Gene A3 matches with only gene B3, but gene B3 also matches with gene A2 after reverse blast (many to one) - Gene A4 matches with gene B4 and B5, so does gene B4 with A4 and A5 (many to many)

But after reading some posts on here, I learned that RBH is not a proper method to define one to one orthologs. I think that it's because there can be multiple hits after blast. Of course, I can select only the result with cases like "A1 and B1". However, I think it can be too strict criteria to get an ortholog gene set matrix to exclude any other results with acceptable indexes such as sufficiently low e-value and high bit score.

However, I cannot find any other proper method or programs to solve this problem, so I want to ask you about better approaches to define one to one orthologs by using RBH or any other methods. Or, just running RBH as I did is just enough for defining ortholog genes "roughly"? If then, can I call this matrix as one to one ortholog matrix?

I'm quite new to Bioinformatics but it's interesting amazingly. I'll be waiting for any replies and please let me know if you get confused in any sentences above because of my short English skill. Thank you for reading such a long article!

genome RBH blast ortholog CDS • 3.4k views
ADD COMMENT
0
Entering edit mode
5.7 years ago
madapada • 0

Also, I have read very good articles from this site about defining ortholog genes. I would recommend these articles for anyone who has similar problems to mine!

What Is The Best Method To Find Orthologous Genes Of A Species?: What Is The Best Method To Find Orthologous Genes Of A Species?

Inference orthology relationship: Inference orthology relationship

If there are two or more B genes with same highest bit score in RBH method: If there are two or more B genes with same highest bit score in RBH method

(I made this article and a very kind answer helps me a lot to understand RBH better!)

protocols for RBH: https://www.protocols.io/view/reciprocal-best-hit-blast-q3rdym6?step=4

and below are some useful softwares to find ortholog via precomputed DB:

A good paper comparing several ortholog finding softwares: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5674930/

OrthoMCL: http://orthomcl.org/orthomcl/

Inparanoid: http://inparanoid.sbc.su.se/cgi-bin/index.cgi

Ensembl Compara: https://www.ensembl.org/info/genome/compara/index.html

Hope these would help you!

ADD COMMENT

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6