Fetch KO IDs for all genes in a genome using accession number only
2
0
Entering edit mode
5.8 years ago

Hi,

Using a Genbank (or Refseq) accession number, what is the best way to obtain the list of KOs using KEGG API?

For example, the genome Bacillus cereus ATCC 10987 has the Genbank accession number GCA_000008005.1, how to get the list of KOs using GCA_000008005.1? I know that I can do that using link, but only if I have the T number (https://api.kegg.net/link/ko/<t-number>). My question is, what if I have a list of >10K genomes, and I only have their genbank accession numbers, how can I achieve this using KEGG API?

Thank you.

KO accession genbank kegg api • 2.4k views
ADD COMMENT
0
Entering edit mode
5.8 years ago
Nitin Narwade ★ 1.6k

In this case KEGGREST, an R package would be really helpful. All you need is just three letters organism code given by KEGG.

I have tried with Bacillus cereus ATCC 10987 (KEGG ORG code: bca).

source("https://bioconductor.org/biocLite.R")
biocLite("KEGGREST")
library(KEGGREST)
all.pathways.for.bca <- keggList("bca")

The above code will return all pathways for bca.

you can get detailed help here.

ADD COMMENT
0
Entering edit mode

Thanks. With the approach you're suggesting, I'd still need the organism name which I don't have. I only have the genbank accession ID. I figured out the solution, which I'll post in a separate post so people can see. Thanks for your reply :)

ADD REPLY
0
Entering edit mode
5.8 years ago

I figured out a solution. There is basically no direct way to query the genome gb accession to get the list of KOs or whatever it is you're looking for. Instead you'll have to do the following:

  1. Get the list of all organisms available in KEGG (this will show the organism 3-letter code and the T-number)

https://api.kegg.net/list/genome

  1. Use the T number to obtain the genome page for each organism that exists in KEGG, for example:

https://api.kegg.net/get/gn:T00001

  1. Parse each genome page looking for the genbank accession which should be next to: DATA SOURCE (Assembly: acc-id)
  2. If that genbank ID is what you're looking for, save it, and save the corresponding T number for it.
  3. Use the T number of the genomes you want to fetch the list of kos (or whatever else you're looking for), for example:

https://api.kegg.net/link/ko/T00001

It's long, but worked for me. I hope it helps someone else. Areej

ADD COMMENT

Login before adding your answer.

Traffic: 1936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6