Question

Gene Pathway Association File For Kegg

5

Entering edit mode

13.1 years ago

Sequencegeek ▴ 740

I am trying to manually calculate the pathway enrichment for some genes. I have downloaded the panther and reactome data files which have the pathway annotations for genes. However, I couldn't find similar files for the KEGG pathway database.

The one I downloaded from ftp://ftp.genome.jp/pub/kegg/pathway/pathway consists of only the descriptions for each pathway without the gene list. Could someone point me to the link of a gene pathway association file for KEGG?

I have also tried the KEGG python api, but I couldn't get a list returned with the following codes:

from SOAPpy import WSDL
wsdl = 'http://soap.genome.jp/KEGG.wsdl'  
serv = WSDL.Proxy(wsdl)  
serv.get_pathways_by_genes(['ENSG00000120328'])  
serv.get_pathways_by_genes(['ENST00000361510'])  
serv.get_pathways_by_genes(['OPA1'])

It did return a pathway list by using gene names like:

serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])

But my gene names are either Ensembl gene id or official gene symbol. Is there a way do find the correspondence between this kind of eco gene and Ensembl gene?

Thanks

kegg pathway • 14k views

ADD COMMENT • link updated 10.6 years ago by Biostar 20 • written 13.1 years ago by Sequencegeek ▴ 740

0

Entering edit mode

note: it seems you have posted two questions: how to parse KGML files and how to convert KEGG ids to Ensembl ids. It would be easier to answer you is you can split the questions.

ADD REPLY • link 13.1 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

note that KEGG's FTP now requires a commercial subscription to be accessed (http://www.genome.jp/kegg/docs/plea.html). Some of the answers in this thread may not be available without that subscription.

ADD REPLY • link 12.8 years ago by Giovanni M Dall'Olio 28k

Kevin Blighe · Answer 1 · 2011-03-08

You could try the KEGG files at the PathVisio download page. You can export them (using PathVisio itself) as a gene list, which should allow you to do the enrichment calculations in whatever way you prefer. But you should in fact be able to use the PathVisio plugins to do the enrichment calculations. If you don't use that it is probably easier to follow one of the other more straightforward answers.

Freds answer explains how we got them there in the first place. The Kegg XML was not trivial to translate though, since they seem to not always follow their own documentation. So the translation from KGML (KEGG) to GPML could still be improved. But for enrichment analysis it should be good enough.

[Edited May 29, 2020 by Kevin Blighe: update links]

Ram · Answer 2 · 2011-03-12

4

Entering edit mode

13.1 years ago

Khader Shameer 18k

I recently played with KEGG API to retrieve pathways associated with genes using same method. I noticed that the method you get_pathways_by_genes is a tricky one. It takes a list of genes and retrieve list of common pathways associated with the input genes. That's why it worked for

serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])

But not worked for the individual gene IDs you provided. For example your 3 IDs could map it to 2 KEGG genes "hsa:4976", "hsa:56124". These two genes don't have any associated pathways.

If you try with two genes (IL8 and IL5) which participate in a common pathway (Chemokine receptor defect),

serv.get_pathways_by_genes([hsa:3568", "hsa:3577"])

You will get the result path:hsa04060

If you need some mapping help with KEGG identifiers you can use KEGG mapper, other option is to try InterMine based TargetMine to calculate enrichment of your lists using GO, OMIM and KEGG and get interesting insights. Manuscript describing TargetMine is available here.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.1 years ago by Khader Shameer 18k

0

Entering edit mode

that is a great resource ! I'm tring to use the Perl API ...i installed the Webservice::InterMine, but I get this error when I execute a template script. Any idea?

<TITLE>Error</TITLE>
<BODY>
<H1>Error</H1>
FW-1 at tornado: Access denied.</BODY>

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 12.4 years ago by Abdel ▴ 150

0

Entering edit mode

Sorry to hear you had problems with the Perl client - feel free to send us more details of your issue (dev@intermine.org) or reply to this thread and we would love to help. Without seeing your script, it sounds like a url issue - if you update the client the recent version has a fix for private ip addresses, so that may solve things. Alex

ADD REPLY • link 12.4 years ago by Alex Kalderimis • 0

0

Entering edit mode

Hi khader Shameer, iam using the function serv.getpathwaysbygenes(genes) to get the pathway and when as a output iam getting the pathway number only if need the name is there function or do you know how to do it ?. for example if the give like this getpathwaysbygenes(hsa:1431) and iam getting hsa:00020 and another 2 but i want the name like this hsa:00020 Citrate cycle (TCA cycle)

ADD REPLY • link 11.9 years ago by dinesh.prabakaran • 0

0

Entering edit mode

AFAIK the API returns only pathway ID not a name (See: http://www.kegg.jp/kegg/soap/doc/keggapi_manual.html#label:95). You can use a local tab-delimited file of KEGG IDs and Pathways and parse it. Unfortunately I cannot point you to a FTP site due to licensing restrictions on KEGG (See: http://www.kegg.jp/kegg/download/)

ADD REPLY • link 11.9 years ago by Khader Shameer 18k

score 3 · Answer 3 · 2011-03-09

3

Entering edit mode

13.1 years ago

Joachim ★ 2.9k

Have a look at ftp://ftp.genome.jp/pub/kegg/genes/organisms/ and the respective sub-directories of the species you are interested in. For example, in /pub/kegg/genes/organisms/hsa you will find H.sapiens.ent, which is probably the kind of file you are looking for.

ADD COMMENT • link 13.1 years ago by Joachim ★ 2.9k

0

Entering edit mode

This is how I always get the pathway genes too. All of the APIs are annoying.

ADD REPLY • link 13.1 years ago by Will 4.5k

score 1 · Answer 4 · 2011-03-09

[?]

ftp://ftp.genome.jp/pub/kegg/pathway/organisms/hsa/ [?]Once you extracted all the Entrez Gene Id for a given pathway you can get the corresponding Ensembl Id or Hugo Gene Name by querying UCSC or BIOMART databases.[?] [?]By the way there are a lot of stuff in the KEGG Download section. Feel free to explore it and may be you will find a nicer file format that will fit your needs.[?] http://www.genome.jp/kegg/download/

score 1 · Answer 5 · 2011-03-09

You should be able to parse the KGML files in R with the KEGGgraph R library. Once you have done that, you can use the biomaRt bioconductor library to get the to HGNC or Ensembl IDs.

Alternatively, you can read the KGML file in Cytoscape with the KGML parser plugin, and then use other Cytoscape plugin to get the gene IDs.

If you absolutely need to do this with python, I am afraid I don't know any library to interrogate Biomart with python. however, you can always download the whole list of Kegg Gene ID and Ensembl id as a tabular file, and get the correspondences from there; this will also eliminate the need for being connected to Internet.