Biostar Beta. Not for public use.
Does The First Letter Of A Uniprot Accession Number Have A Meaning?
2
Entering edit mode
9.6 years ago
Luispedro • 30
@Luispedro1446

Accession numbers are string of the form Q3TET3 or P47753. I was wondering whether the first letter has any meaning.

uniprot • 4.5k views
4
Entering edit mode
9.6 years ago
Lyco ♦ 2.3k
@Lyco1881

Larry is correct, the very old numbers were P..., Q...., O..., followed by numbers only. Later on, they allowed letters instead of numbers. The next wave of accession numbers started with A..., B..., C... and so on. From the first letter, you can roughly estimate how old the accession number is. Unfortunately, the uniprot people have begun to assign 'new' accession numbers to old entries (keeping the original number as secondary accnos)

4
Entering edit mode
9.6 years ago
@Chris Evelo1350

To add to @Lyco's answer: they are not simply assigning 'new' accession number to old entries. That only happens when items are merged or split, and the original numbers are indeed kept as secondary accession numbers.

Entries can have more than one accession number. This can be due to two distinct mechanisms:

a) When two or more entries are merged, the accession numbers from all entries are kept. The first accession number is referred to as the ‘Primary (citable) accession number’, while the others are referred to as ‘Secondary accession numbers’. These are listed in alphanumerical order.

b) If an existing entry is split into two or more entries (‘demerged’), new ‘primary’ accession numbers are attributed to all the split entries while all original accession numbers are retained as ‘secondary’ accession numbers.

Also be aware that you: "should always use the primary accession number of an entry in any citation and link since it is the only unique stable identifier for an entry."

(Ohh and the P actually meant "protein" but then they ran out of P's)

0
Entering edit mode

@chris, new accession numbers only for splits and mergers is the theory. In real life, these things happen quite a lot. Just have a look how many yeast proteins have accession numbers starting with C,D,E... although they are present in the database since completion of the genome. Take e.g. STE2_YEAST which is now http://www.uniprot.org/uniprot/D6VTK4 but used to be http://www.uniprot.org/uniprot/P06842.txt?version=8 or even http://www.uniprot.org/uniprot/P06842.txt?version=1 when they were still using dollar signs for separating the species name.

0
Entering edit mode

Well if you trace it back it says on http://www.uniprot.org/uniprot/P06842?version=* that P06842 "Demerged into D6VTK4 and P0CI39." so that rally seems to be a split. These things might just occur more often then you think.

0
Entering edit mode

I would not call this one a split, as the sequences and species ID for P06842 and D6VTK4 are identical. What happened here is the following: There used to be one swissprot entry for budding yeasst STE2 with the accession number P06842. Then, somebody sequence another strain of budding yeast, and the STE2 sequence happen to be indential. Rather than doing the logical thing - giving the new sequence a new accession number - the uniprot philosophy argues that up to this point, the old sequence entry represented both strains and now they have to 'de-merging' them to create separate entries-

0
Entering edit mode

I would not call this one a split, as the sequences and species ID for P06842 and D6VTK4 are identical. What happened here is the following: There used to be one swissprot entry for budding yeasst STE2 with the accession number P06842. Then, somebody sequenced another strain of budding yeast, and the STE2 sequence happens to be indential. Rather than doing the logical thing - giving the new sequence a new accession number - the uniprot philosophy argues that up to this point, the old sequence entry represented both strains and now they have to 'de-merge' them to create separate entries- –

3
Entering edit mode
9.6 years ago
@Larry_Parnell559

The first entries in the form of P##### into the database began with P for protein. Then, I believe Q was added as a prefix, followed by O. Other designations, such as Q3TET3, came later.

1
Entering edit mode
9.6 years ago
Rm 7.8k
@Rm654

find the info here http://www.uniprot.org/manual/accession_numbers

1
Entering edit mode

I had seen that, but either I'm being thick or it doesn't answer my question.