Uniprotkb And Genes
1
1
Entering edit mode
10.9 years ago
willyjensen ▴ 10

I'm hoping to get some advice on the best way to build out a gene set using the UniprotKB dataset. My set needs to include both genes and proteins, which UniprotKB provides nicely, but the only real identifier they have for genes is a plain text primary name and sometimes just an ORF name or ordered locus name. I could use these identifiers to create my set of genes, but I have a feeling there will be problems with such an approach.

For example:

  • What if for one organism, there are two distinct genes with the same primary name, ordered locus name, orf name etc?
  • What if names drastically change from one release to another?

I know that many of the genes link to other databases, such as Entrez Gene with a stable set of "gene identifiers", but it's definitely not all of them. Also, I'd prefer to just stick with UniprotKB if possible instead of having to mix and match multiple resources, but if anyone has experience with this, I'm wide open to suggestions. What have you guys done in similar situations?

gene uniprot annotation protein database • 2.4k views
ADD COMMENT
3
Entering edit mode
10.9 years ago

UniProt has cross-references to model organism databases (MODs) where applicable, e.g. HGNC for human, MGI for mouse, FlyBase for drosophila etc.

This URL includes all cross-referenced MODs (along with a few other organism-specific databases): http://www.uniprot.org/database/?query=category:%22Organism-specific+databases%22

Would it help to use the numerical HGNC (etc) identifier instead of the gene symbol? If you have questions about any particular organism or gene name, please don't hesitate to contact the UniProt helpdesk: help@uniprot.org

ADD COMMENT

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6