BLASTP criteria for identification of paralogous and orthologous genes
1
1
Entering edit mode
9.5 years ago
biolab ★ 1.4k

Dear all,

As my title mentioned, could you please give me some suggestions about the BLASTP criteria of identifying paralogous and orthologous genes among a few species. The species I am analyzing do not have much sequencing data in NCBI, but our lab recently generate HT-seq data for them.

I found evalue of 1e-5 is not strict enough for para- or ortho- identification. I think it is necessary to further limit the criteria. I found a paper (Bioinformation. 2011; 6(1): 31) used >60% sequence identity and >80% alignment length, but I am not certain if it is a general rule.

Any of your answers will be highly appreciated! THANKS!

orthologous paralogous • 7.0k views
ADD COMMENT
3
Entering edit mode
9.5 years ago
pld 5.1k

If you want to search for orthologs, I would use the Best Reciprocal Blast approach. Only searing in one direction (your species to reference species) is only going to find you homologs, the additional "reverse" step will allow you to hone in on orthologs. I would use a tougher expect value (1e-10) and percent identity. Bit scores can be used to break ties.

I'm not sure that Query Coverage is the best metric to filter on, it is possible for two hits to have the same coverage but one is better than the other. Especially for more distantly related species.

Here is a good reference:

http://www.ncbi.nlm.nih.gov/pubmed/18042555

The paper you reference doesn't do the best job of calling paralogs (IMO), they also don't do the best job of describing their process or settings used e.g. "After using rigid selection criteria for BLASTP search (very low E-value,>60% sequence identity and >80% alignment length)".

ADD COMMENT
0
Entering edit mode

Thanks a lot joe.cornish826.

ADD REPLY
0
Entering edit mode

The key difference between what that paper did and what best reciprocal blast hits (BRBH) is that only BRBH can distinguish between paralogs and orthologs. Just blasting in one direction only allows you to identify homologs.

Also, there are many other tools out there that perform ortholog detection/mining using a variety of approaches. Some are still BRBH at the core but use additional metrics or improve the ease of use/analysis of results.

http://www.biomedcentral.com/1471-2105/12/11

This tool describes the BRBH process in better detail and does a good job of describing the different relationships that proteins/genes/etc can have.

In general, there's no solid rule on what settings to use. The best approach would be to develop a benchmark to characterize the sensitivity/specificity using a set of known items, maybe two well known species in similar genera/families of your organisms of interest. You can (and should do this anyways) manually spot check individual results and compile some stats to get a feel for what is going.

A more simple approach would be to find some pubs that have done this analysis on species close to yours and get a consensus for the settings. However, be sure that they're actually doing a true ortholog analysis!

ADD REPLY

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6