How accurate is UniProt?
0
0
Entering edit mode
5.6 years ago
alexille5640 ▴ 10

I am curious about the reliability of Uniprot amino acid sequences?

Say I want to do the following:

  1. Reverse translate a reviewed Uniprot amino acid sequence into a nucleotide sequence (using Sequence Manipulation Suite Reverse Translate)
  2. Synthesize that nucleotide sequence in vitro into an expression vector plasmid
  3. Transfect cells with that plasmid for expression

Is it expected that a protein with the exact same amino acid sequence will be expressed by the cells?

gene sequencing sequence genome • 2.3k views
ADD COMMENT
2
Entering edit mode

UniProtKB encompasses several individual protein sequence resources that are depicted on this page. If you are talking about a sequence that is from SwissProt (manually reviewed/curated sequences) or UniRef100 clusters then that sequence is likely perfectly accurate. Every SwissProt record should have a corresponding nucleotide entry so you should not need to do any sequence manipulation (but there may be an exon/intron model to consider).

That said it is unclear what is the aim of this question. Do you wish to express a protein that is not normally present in that host?

ADD REPLY
0
Entering edit mode

Sorry for my lack of knowledge, but how do you access the corresponding nucleotide entry of for example: Mus musculus Actin Beta entry from uniprot (https://www.uniprot.org/uniprot/P60710).

ADD REPLY
0
Entering edit mode

If you scroll down to the Sequence Databases section on the page you linked you will find this information.

ADD REPLY
1
Entering edit mode

Yes (for SwissProt), assuming ceteris paribus. Question is, can we ensure ceteris paribus?

ADD REPLY
1
Entering edit mode

How do you propose to reverse translate the amino acid sequence? The number of resulting possibilities are astronomical...just using the most frequent codon or something?

ADD REPLY
0
Entering edit mode

Yes, relying on the degeneracy of codons, I'm assuming "Sequence Manipulation Suite: Reverse Translate" is sufficient.

ADD REPLY
0
Entering edit mode

relying on the degeneracy of codons?

What do you mean by that?

ADD REPLY
0
Entering edit mode

The degeneracy, used to describe how one Amino Acid can be encoded by multiple codons.

Codon Table

ADD REPLY
0
Entering edit mode

I am aware of the concept of codon degeneracy. I was asking what you meant by "relying on" codon degeneracy for picking the most frequent codon. Without codon degeneracy one wouldn't encounter any form of frequency, so I was trying to understand the point you were making.

ADD REPLY
0
Entering edit mode

I have a somewhat related question in that I'm wondering if there are web based tools available to generate 100s to 1000s of DNA sequences from a single protein sequence? There are plenty of codon optimization tools, but I want to generate many nt sequences starting from the same protein sequence to analyze potential DNA secondary structure variants.

ADD REPLY
0
Entering edit mode

I don't know about web-based back translation, but I have a tool here that might be of interest:

https://github.com/jrjhealey/bioinfo-tools/blob/master/backtranslate.py

Note, this doesn't generate all possible sequences (since its an astronomical number of sequences), its just a useful way of showing which codons are the most redundant in a given sequence.

ADD REPLY

Login before adding your answer.

Traffic: 1736 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6