Getting full protein names from Uniprot-Swissprot identifiers or short names
3
1
Entering edit mode
7.9 years ago
lay_0 ▴ 50

Hello,

I have a list of Uniprot/swissprot identifiers, such as:

P12281 Q05397 P50430 P54904 ...

I also have the short names for the corresponding identifiers :

MOEA_ECOLI FAK1_HUMAN ARSB_RAT P5CR1_ARATH ...

Does anyone knows how can I get the full protein names for these proteins in batch?, like the first one would be: Molybdopterin molybdenumtransferase

If I could get even more information in this entry, like the associated GO terms, even better...

thank you!

swissprot Uniprot protein names GO terms • 5.0k views
ADD COMMENT
4
Entering edit mode
7.9 years ago
Denise CS ★ 5.2k

Try the Retrieve/ID mapping tool from UniProt. MOEA_ECOLI FAK1_HUMAN ARSB_RAT P5CR1_ARATH will be your 'UniProtKB AC/ID' and if you choose UniProtKB you will get a table with the protein names. You can filter the results by Gene Ontology, Taxonomy and others.

ADD COMMENT
0
Entering edit mode

... and you can configure your table output to keep only the information you need, e.g. you could remove all columns except identifier and protein name (cf http://www.uniprot.org/help/customize, http://insideuniprot.blogspot.ch/2015_03_01_archive.html).

ADD REPLY
3
Entering edit mode
7.9 years ago
GenoMax 141k

There will be a more clever way (or a single file of ID mapping somewhere) of doing this but here is one you can use now.

  1. Get the uniprot fasta sequences here.
  2. Unzip the file by gunzip uniprot_sprot.fasta.gz
  3. Collect all fasta headers in a new file grep "^>" uniprot_sprot.fasta > uniprot_header
  4. Pull the names out that you need e.g. grep FAK1_HUMAN uniprot_header
  5. Parse as needed
  6. You can iterate over your ID's to get them all.
ADD COMMENT
0
Entering edit mode

Thanks, that should work too, but I didn't want to go back to the sequences in the database. I was being a bit stubborn about trying the gene id conversion tool before because I didn't really want to "convert IDs"... but after natasha posted I decided to try, and it seems to work just fine, it gives me the full protein names.

ADD REPLY
0
Entering edit mode

Which of the "conversion tools" did you use? There are multiple in that thread and some are not being currently maintained (e.g. DAVID)

ADD REPLY
0
Entering edit mode

In DAVID webpage I "converted" from Uniprot_ID to David(Default), and it gave a list of full names for the proteins when I tried with a small subset. However it didn't work with my whole list (returns an empty table). I just tried Denise suggestion and that was much better as the result it is in the same Uniprot database. Thank you.

ADD REPLY
1
Entering edit mode
7.9 years ago
natasha.sernova ★ 4.0k

See this post:

Gene Id Conversion Tool

See also this post

How To Programmatically Retrieve A Batch Of Fasta Sequences From For A List Of Uniprot Accession Ids?

There are Perl-versions and UniProt-batch version inside.

ADD COMMENT
0
Entering edit mode

the conversion tool works for my purpose, thanks!

I recall, it does;t work for my full list, only for a small subset, I don't know what is going on.

ADD REPLY
0
Entering edit mode

DAVID is not being actively maintained. Last updates were probably from 2010-2011. So it is not surprising.

ADD REPLY
0
Entering edit mode

This site may help you later. See this post:

A: Ensembl ID to Gene Symbol

and find this url there:

https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

It helps to convert many formats to many other formats.

ADD REPLY

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6