Entering edit mode
5.8 years ago
kritika.tauras
•
0
I used blastp to search for the homologs of the serine threonine kinase protein (present in mycobacterium tuberculosis). I used the nr database, and all other default settings. My blast results have multiple hits for the same species. For eg, it has many hits for mycobacterium abscessus. Each hit has a different accession no. but the species is the same.
This leads to all my visible hits being of a few number of species only, and I cannot view more distantly related species in the list.
Why does the nr database have multiple entries of the same protein in the same species? How can I solve my issue?
You have kind of answered your own question
Those must be multiple submissions (possibly for different strains/from different submitters) from the same species. Instead of doing a simple blast you may want to start looking at psi- or delta-blast to get more distant homologs.
The same problem occurs with both psi- an delta-blast. The result still has duplicate entries.
While it is going to be extremely painful you can un-select most of those mycobacterial entries going into round 2 of Psi- or Delta-blast.
Only other way to do this may be by using a local copy of
nr
suitably cleansed of these multiple accessions.You may want to try the blast at Ensembl bacteria.
It has an option to specifically select
Distant homologies
(undersearch sensitivity
) as an explicit option.Edit: Moved this to a comment since there is a limit of a maximum 25 genomes to search against. Perhaps that may work for you.