Biostar Beta. Not for public use.
Question: Bioconductor -- Quickly Look Up Aspect Of Go Term
1
Entering edit mode

I am working on a project using Bioconductor that requires that I lookup which GO ontology a given GO term belongs to (i.e. either Molecular Function, Biological Process, or Cellular Component). I need to do this tens of thousands of times, in the inner loop of a larger program. My current solution is to use the GO.db Bioconductor package to create three predicates like this one:

library(GO.db)
isMF <- function(term){
!is.null(GOMFPARENTS[[term]])
}

Unsurprisingly, however, this is prohibitively slow when invoked tens of thousands of times. Is there a Bioconductor package out there somewhere that would give me a faster way to look up this data, or will I need to implement a faster data structure for this purpose myself? I'm just learning R, so I'd like to just use an existing function, if possible.

ADD COMMENTlink 6.5 years ago cclark • 10 • updated 6.5 years ago Martin Morgan ♦ 1.6k
Entering edit mode
0

If you're just learning, you might want to explore a bit more. This is a perfect place to use hash tables.

ADD REPLYlink 6.5 years ago
pld
4.8k
• updated 13 months ago
RamRS
21k
1
Entering edit mode

Please ask questions about Bioconductor packages on the Bioconductor mailing list (no subscription required). As with most things in R, it's better to use vectorized operations rather than iterating. Also, the interface to GO and other databases has been simplified. You could instead

> vals = select(GO.db, keys(GO.db, "GOID"), c("TERM", "ONTOLOGY"))
> dim(vals)
[1] 37391     3
> head(vals)
        GOID                                                         TERM ONTOLOGY
1 GO:0000001                                    mitochondrion inheritance       BP
2 GO:0000002                             mitochondrial genome maintenance       BP
3 GO:0000003                                                 reproduction       BP
4 GO:0000006 high affinity zinc uptake transmembrane transporter activity       MF
5 GO:0000007     low-affinity zinc ion transmembrane transporter activity       MF
6 GO:0000009                       alpha-1,6-mannosyltransferase activity       MF

and then do standard R operations, e.g., vals[vals$ONTOLOGY == "MF",]. The Annotation work flow provides some additional material.

ADD COMMENTlink 6.5 years ago Martin Morgan ♦ 1.6k • updated 13 months ago RamRS 21k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0