Question

Uniprotkb And Genes

1

Entering edit mode

10.9 years ago

willyjensen ▴ 10

I'm hoping to get some advice on the best way to build out a gene set using the UniprotKB dataset. My set needs to include both genes and proteins, which UniprotKB provides nicely, but the only real identifier they have for genes is a plain text primary name and sometimes just an ORF name or ordered locus name. I could use these identifiers to create my set of genes, but I have a feeling there will be problems with such an approach.

For example:

What if for one organism, there are two distinct genes with the same primary name, ordered locus name, orf name etc?
What if names drastically change from one release to another?

I know that many of the genes link to other databases, such as Entrez Gene with a stable set of "gene identifiers", but it's definitely not all of them. Also, I'd prefer to just stick with UniprotKB if possible instead of having to mix and match multiple resources, but if anyone has experience with this, I'm wide open to suggestions. What have you guys done in similar situations?

gene uniprot annotation protein database • 2.4k views

ADD COMMENT • link updated 4.9 years ago by Biostar 20 • written 10.9 years ago by willyjensen ▴ 10

score 3 · Answer 1 · 2013-05-27

UniProt has cross-references to model organism databases (MODs) where applicable, e.g. HGNC for human, MGI for mouse, FlyBase for drosophila etc.

This URL includes all cross-referenced MODs (along with a few other organism-specific databases): http://www.uniprot.org/database/?query=category:%22Organism-specific+databases%22

Would it help to use the numerical HGNC (etc) identifier instead of the gene symbol? If you have questions about any particular organism or gene name, please don't hesitate to contact the UniProt helpdesk: help@uniprot.org