Biostar Beta. Not for public use.
Ensembl API, command-line argument
0
Entering edit mode
3.3 years ago
oars • 150
@oars41179

I'm experimenting with the Ensembl API and trying to write a script where I can specify a gene (using the Ensembl ID) via a command line argument. Specifically, I'm trying to extract the CDS sequence for each transcript associated with a gene provided via the command-line argument.

From the Rest API website, I found the following script for locating the CDS sequence:

import requests, sys

server = "http://rest.ensembl.org"
ext = "/sequence/id/ENST00000288602?type=cds"

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

The above code works perfect; however, I cannot get the command line argument version to quite work, so far this is what I've got:

import requests, sys

server = "http://rest.ensembl.org"
ext = "/sequence/id/gene?type=cds"

gene=sys.argv[1]

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

I think I'm close? Maybe not??

My command-line argument is simply:

$ python file.py Ensembl gene id (i.e. ENSG00000186642)
ensembl API bash python CDS • 861 views
ADD COMMENTlink
3
Entering edit mode
3.3 years ago
@Alex Reynolds20

Perhaps try:

import requests, sys, errno

server = "http://rest.ensembl.org"

gene=sys.argv[1]

if not gene:
  sys.exit(errno.EINVAL)

ext = "/sequence/id/%s?type=cds" % (gene)

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()


print r.text

About errno: https://docs.python.org/2/library/errno.html

About Python string formatting: https://pyformat.info/

ADD COMMENTlink
0
Entering edit mode

Fantastic, many thanks for the reply and the references. This performed beautifully for ENSG00000169174; however, some other Ensembl ID's throw the following error:

Traceback (most recent call last):
  File "WEEK10.py", line 15, in <module>
    r.raise_for_status()
  File "/Users/oars/anaconda/lib/python2.7/site-packages/requests/models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://rest.ensembl.org/sequence/id/ENSG00000186642?type=cds

Perhaps this is coming from Ensembl's API and not with the structure of the script?

ADD REPLYlink
1
Entering edit mode

Not sure if you need to query a different way?

$ wget --header="Content-Type:text/x-fasta" http://rest.ensembl.org/sequence/id/ENSG00000186642
>ENSG00000186642 chromosome:GRCh38:11:72576141:72674591:-1
GTTTATCTCTCAGTCTCTCTGTCTGTGAGTCTTTTTTCCTCTCTCCCAGTCAGACTCTCT
CTCTACCCCTCCCTCTCTCCCTCTCTCCCTCTCTGTCTGGGCCTCTCTCTGTTCCTCCTC
...
GTGAAGGTGTCTCCAACAGGCTTGATGTGTAGGCATTATTGTAAGTTTGCAACTTCTTGG

I don't really grok Ensembl, but there's someone on here who can probably help you with debugging their REST API.

ADD REPLYlink
0
Entering edit mode

Thanks again Alex. I think my issue was that gene id's (often) won't map to a CDS sequence. Instead, I switched all references in the code from gene to transcript_id. Everything seems to work!

import requests, sys, errno

server = "http://rest.ensembl.org"

transcript_id=sys.argv[1]

if not transcript_id:
  sys.exit(errno.EINVAL)

ext = "/sequence/id/%s?type=cds" % (transcript_id)

r = requests.get(server+ext, headers={ "Content-Type" : "text/x-fasta"})

if not r.ok:
  r.raise_for_status()
  sys.exit()

print r.text
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3