Question

How accurate is UniProt?

0

Entering edit mode

5.6 years ago

alexille5640 ▴ 10

I am curious about the reliability of Uniprot amino acid sequences?

Say I want to do the following:

Reverse translate a reviewed Uniprot amino acid sequence into a nucleotide sequence (using Sequence Manipulation Suite Reverse Translate)
Synthesize that nucleotide sequence in vitro into an expression vector plasmid
Transfect cells with that plasmid for expression

Is it expected that a protein with the exact same amino acid sequence will be expressed by the cells?

gene sequencing sequence genome • 2.3k views

ADD COMMENT • link updated 5.6 years ago by WouterDeCoster 47k • written 5.6 years ago by alexille5640 ▴ 10

2

Entering edit mode

UniProtKB encompasses several individual protein sequence resources that are depicted on this page. If you are talking about a sequence that is from SwissProt (manually reviewed/curated sequences) or UniRef100 clusters then that sequence is likely perfectly accurate. Every SwissProt record should have a corresponding nucleotide entry so you should not need to do any sequence manipulation (but there may be an exon/intron model to consider).

That said it is unclear what is the aim of this question. Do you wish to express a protein that is not normally present in that host?

ADD REPLY • link 5.6 years ago by GenoMax 142k

0

Entering edit mode

Sorry for my lack of knowledge, but how do you access the corresponding nucleotide entry of for example: Mus musculus Actin Beta entry from uniprot (https://www.uniprot.org/uniprot/P60710).

ADD REPLY • link 5.6 years ago by alexille5640 ▴ 10

0

Entering edit mode

If you scroll down to the Sequence Databases section on the page you linked you will find this information.

ADD REPLY • link 5.6 years ago by GenoMax 142k

1

Entering edit mode

Yes (for SwissProt), assuming ceteris paribus. Question is, can we ensure ceteris paribus?

ADD REPLY • link 5.6 years ago by Ram 43k

1

Entering edit mode

How do you propose to reverse translate the amino acid sequence? The number of resulting possibilities are astronomical...just using the most frequent codon or something?

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

Yes, relying on the degeneracy of codons, I'm assuming "Sequence Manipulation Suite: Reverse Translate" is sufficient.

ADD REPLY • link 5.6 years ago by alexille5640 ▴ 10

0

Entering edit mode

relying on the degeneracy of codons?

What do you mean by that?

ADD REPLY • link 5.6 years ago by Ram 43k

0

Entering edit mode

The degeneracy, used to describe how one Amino Acid can be encoded by multiple codons.

Codon Table

ADD REPLY • link 5.6 years ago by alexille5640 ▴ 10

0

Entering edit mode

I am aware of the concept of codon degeneracy. I was asking what you meant by "relying on" codon degeneracy for picking the most frequent codon. Without codon degeneracy one wouldn't encounter any form of frequency, so I was trying to understand the point you were making.

ADD REPLY • link 5.6 years ago by Ram 43k

0

Entering edit mode

I have a somewhat related question in that I'm wondering if there are web based tools available to generate 100s to 1000s of DNA sequences from a single protein sequence? There are plenty of codon optimization tools, but I want to generate many nt sequences starting from the same protein sequence to analyze potential DNA secondary structure variants.

ADD REPLY • link 4.0 years ago by slazetic • 0

0

Entering edit mode

I don't know about web-based back translation, but I have a tool here that might be of interest:

https://github.com/jrjhealey/bioinfo-tools/blob/master/backtranslate.py

Note, this doesn't generate all possible sequences (since its an astronomical number of sequences), its just a useful way of showing which codons are the most redundant in a given sequence.

ADD REPLY • link 4.0 years ago by Joe 21k