get protein sequence from ID
2
0
Entering edit mode
9.2 years ago
arronslacey ▴ 320

Hi - could someone point me to a database that I could query to find the protein sequence when given a protein ID (i.e P51811). I'm familiar with doing some queries on the UCSC server, but there are so many tables!

Thanks very much.

ANSWER

Thanks you to @themysticgeek and @Emily_Ensembl for pointing the uniprot REST API - I had forgotten. On this recomendation I botched this little bash script together to get the sequences from a list of IDs. Hope it helps anyone (feel free to optimize the code if you want!)

#!/bin/bash

#download fasta seqs given file of uniprot ids

file=$1
name=$2

list=$(cat ${1})

mkdir ${name}
cp ${2} ${name}
cd ${name}

for word in ${list}
do
    wget -nv http://www.uniprot.org/uniprot/$word.fasta
done
SNP sequence protein gene • 4.2k views
ADD COMMENT
4
Entering edit mode
9.2 years ago
kautilya ▴ 430

P51811 is a Uniprot ID . In order to get its sequence you can simple use the URL http://www.uniprot.org/uniprot/YOUR_PROTEIN_ID.fasta

e.g http://www.uniprot.org/uniprot/P51811.fasta

Besides this there are a large number of other ways to access the sequence - the details of these can be found at the Uniprot REST guide.

ADD COMMENT
1
Entering edit mode
9.2 years ago
Emily 23k

Have you tried putting your IDs into the search box in Uniprot?

Alternatively, if you have a long list of them that you want sequences for and they're all Uniprot IDs like this one, you could try BioMart. There's a help video to get you started here. Use:

Database, Ensembl genes

Filters, ID list limit, pick Uniprot from the dropdown and paste in your list

Attributes, Sequences, protein sequences

ADD COMMENT

Login before adding your answer.

Traffic: 2908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6