Retrieving Uniprot Protein Isoform Sequences Programmatically?
3
7
Entering edit mode
13.2 years ago
Pablo Pareja ★ 1.6k

Hi everyone, I've been searching for a way to retrieve protein isoforms sequences programmatically but unfortunately didn't succeed so far. After parsing the whole Uniprot Trembl and Swissprot xml files I've seen that the sequences are not included, just the isoforms definition plus some extra information. Do you know if there's any kind of web service for it? Thanks in advance,

Pablo Pareja

uniprot xml isoform • 6.2k views
ADD COMMENT
3
Entering edit mode
13.2 years ago
Jerven ▴ 660
http://www.uniprot.org/uniprot/<ANY_UNIPROT_ISOFORM_ID_THAT_YOU_HAVE>.fasta

replace the part in CAPITALS with isoform accesions that you want to find.

e.g.

http://www.uniprot.org/uniprot/P05067-2.fasta

This won't work for the cases where the isoform id is the canonical sequence and there is just one sequence in the record.

Should be okay week after next for these cases as well.

ADD COMMENT
0
Entering edit mode

Hi jerven. Welcome to Biostars! Your first link is broken and needs some mending. Cheers!

ADD REPLY
0
Entering edit mode

Hi jerven. Thanks for your answer, I assume then that this url-pattern should work from mid-February on. I have to ask anyways, isn't there a way for retrieving more than just one isoform sequence at a time? e.g. generating a multi-fasta file with several isoform sequences included? Cheers

ADD REPLY
0
Entering edit mode

Yes this will work now but when you get back a 500 error try again without -d+ of the isoform to get the canonical sequence.

ADD REPLY
2
Entering edit mode
13.2 years ago

Check out this section of the FAQ. Note that although it mentions they can be downloaded here , the link actually goes to the documentation of Paul Kersy's varsplic.pl program which is usually found here

ADD COMMENT
2
Entering edit mode
13.2 years ago
Pablo Pareja ★ 1.6k

I didn't realized so far that there's a file including every isoform sequence in Uniprot downloads site:

isoform fasta file

Maybe this is not the best option for some people willing to use a web service instead but in my case it fits my requirements perfectly. (Just with a really simple Java program I can extract all the information I need from any protein isoform.)

ADD COMMENT
0
Entering edit mode

Heads up: in 2018 this file contains only non-canonical splice variants (not all isoforms). Canonical isoforms are stored in ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz which is an order of magnitude bigger.

ADD REPLY

Login before adding your answer.

Traffic: 2223 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6