How To Get A Full Proteome Of Helicobacter Pylori 26695.
4
2
Entering edit mode
13.2 years ago
Marcinmagnus ▴ 80

Do you know any other place than UniProt to get this kind of data. I would like to get a file with all sequences without much hassle.

I used http://www.uniprot.org/uniprot/?query=organism%3A%22Helicobacter+pylori+26695%22&sort=score query and I got 2 proteins :|

protein uniprot • 4.0k views
ADD COMMENT
0
Entering edit mode

helicobacter AND pylori AND strain:26695 gives better result. However I'm still not quite sure if the procedure is correct.

ADD REPLY
0
Entering edit mode

maybe you could try taxonomy:"Helicobacter pylori" so you could get all the proteins for all the H. pylori when was the H. pylori 26695 genome sequenced? If it's too recent maybe proteins are not in the Uniprot db yet

ADD REPLY
3
Entering edit mode
13.2 years ago

I would get such data from NCBI RefSeq. The directory for Helicobacter pylori 26695 is:

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Helicobacter_pylori_26695_uid57787/

There you can find sequences in a variety of formats. What you need is probably NC_000915.faa, which is a FASTA file with all the translation products (proteins).

ADD COMMENT
0
Entering edit mode

It's great! It is exactly what I wanted.

I got the same number of protein (at least close) when I used http://www.uniprot.org/taxonomy/?query=strain%3A26695&sort=score

What do you think? Is it significant different. I had already proteins from UniProt in my local database. Should I stick to them or should I download data from NCBI RefSeq?

ADD REPLY
0
Entering edit mode

http://www.uniprot.org/uniprot/?query=organism:210+keyword:181 would give the heliobacter_pylori complete proteome as defined by uniprot. However, I am not sure which strain that would be. Will ask around

ADD REPLY
0
Entering edit mode

Which database to use is largely a subjective choice. It is difficult to know up front if the proteome provided by UniProt is better or worse than that provided by RefSeq. The main advantage that I see of using RefSeq is that it is based on a specific fully sequenced genome, for which reason I can be sure that it is a complete proteome. UniProt - not being a genome database - might in some cases give you a very partial proteome. But I guess you will have to judge on a case-by-case basis.

ADD REPLY
1
Entering edit mode
13.2 years ago
Science_Robot ★ 1.1k

Do you have to use UNIPROT?

I searched for your strain in NCBI's taxonomy database and got this page.

On the right there are several links for Nucleotide, Protein, Genomes, etc... Click "Protein" and arrive here

If you want to download the sequences, click 'send to'.

ADD COMMENT
1
Entering edit mode
13.2 years ago

what about integr8 at the ebi? http://www.ebi.ac.uk/integr8/FtpSearch.do?orgProteomeId=23 all the proteome set is available.

ADD COMMENT
0
Entering edit mode
13.2 years ago
Marina Manrique ★ 1.3k

I'd do this. using the advanced search

  1. Search by organism "Helicobacter pylori 26695"
  2. And then search by keyword = Complete proteome

Here you can find more info about what "Complete proteome" keyword means

ADD COMMENT

Login before adding your answer.

Traffic: 2180 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6