Biostar Beta. Not for public use.
syntax questions about getting sequences from KO numbers using KEGGREST
1
Entering edit mode
2.4 years ago
jon.sy.tarn • 10
@jon.sy.tarn30421

I suppose my question is along the same veins of previous posts such as these:

Basically what I want to do is feed a list of KO numbers from kegg into a program, and get the resulting amino acid or nucleotide fastas from each of these KO numbers.

Based on what I've already read, I need to be using KEGGREST.

However, I'm having some trouble deciphering the syntax.

This is the usage provided for me via keggget on the API manual:

keggGet(dbentries, option = c("aaseq", "ntseq", "mol", "kcf", "image", "kgml"))


and this is an example they show:

str(res)
res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino
## acid sequences of a human gene and an
## E.coli O157 gene


my question is: how do I decipher this? Am I to assume that I can enter a KO number in place of the aca or hsa numbers shown above?

sorry for the potentially basic question.

KEGG • 278 views
0
Entering edit mode
2.4 years ago
Marks • 40
@Marks43246

Yes that's correct. It might be helpful to explicity define what each option is doing to illustrate how the function is operating:

res <- keggGet(option = "aaseq", dbentries = c("hsa:10458", "ece:Z5100"))


option selects the database to search and dbentries is the ID of the entries you want to retrieve. It will return a list, which you can subset using the $ notation. You will then have to use the package biostrings to manipulate the sequences. If you have a list of say 100 IDs you want queries, you can automate the process like this: my_list <- c("hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100", "hsa:10458", "ece:Z5100") split_my_list <- split(my_list, 1:4) results <- lapply(X = split_my_list, FUN = keggGet, option = "aaseq")  I've copied the same entries over and over again just for illustration purposes. Using the function split I've split the list in 4 chunks (change 1:4 to 1:10 or whatever you want). Then I use lapply to apply keggGet on the split list. It will return a list of lists, so subset like this results$1$results$'1'\$blahblah.