Is There A Standard Format For Go Term Enrichment Results?
2
11
Entering edit mode
12.7 years ago
Chris Mungall ▴ 320

I am fairly certain there is no such standard, but I'm also fairly certain some other people must have thought about this.

One advantage of a standard format is that it would simplify the running of multiple enrichment tools in parallel and comparing or combining results. This is particularly useful to us within the GO consortium, as we would like to compare analyses between newer/older versions of the ontology and annotations. A more ambitious aim is for publications that include GO enrichment results to provide these in a standard format, to simplify replicating results.

Note that it would not be necessary for all tools to be conformant in order for the standard to be successful. Converters could be provided to rewrite the ad-hoc output of heterogeneous tools to the standard form. However, it would help to have buy-in from some of the more popular tools.

I have listed some desiderata for such a standard:

  • An abstract specification with different serializations for different purposes (tabular, JSON, XML, RDF)
  • Extensibility
  • Use of ontology terms in place of free text to describe algorithms, parameters and data processing (for example, the Ontology for Biomedical Investigations (OBI) has a rich collection of these)

Minimal information:

  • Tool name + algorithm + version
  • Input token list + token type (e.g. symbol)
  • Background token list + token type (if provided)
  • Token-gene ID mapping (plus unmatched tokens)
  • Algorithm parameters (cut-offs, algorithm selected, etc)
  • Ontology id + version
  • gene association set id / species + version
  • List of results - for each result:
    • term ID
    • optional term metadata
    • list of gene IDs (+ optional gene metadata)
    • scoring metadata (p-vals, rank, etc)

Optional information:

  • Unique identifier/URI for the results
  • Metadata on input token set (e.g. "genes up-regulated in diabetes")
  • graphical output

Is is this of general interest? If so, does the above sound like a good start, and what would be an appropriate forum for future discussions? Is there an existing tool whose output might be a good candidate for standardization?

gene function enrichment format • 4.9k views
ADD COMMENT
1
Entering edit mode

Good point and interesting paper. Yes, my list is biased towards simple gene lists. I think we would probably want a fairly generic core and extensions for GSEA, genomic intervals, etc.

ADD REPLY
0
Entering edit mode

Interesting topic, and clearly a need for this. Another piece of meta-data that would be good to capture is if the analysis is done at the gene list or genomic interval level, and if the latter if any corrections for genomic structure are applied, e.g. http://www.ncbi.nlm.nih.gov/pubmed/16504139

ADD REPLY
3
Entering edit mode
12.7 years ago
Qdjm 1.9k

Hi Chris,

Good idea. One important thing that appears to be missing from your minimal information is the subset of GO terms tested. Often people only test for enrichment of GO terms at a given level in the hierarchy or with a minimum number of associations.

ADD COMMENT
0
Entering edit mode

Good point. As well as a subset of terms, we can also imagine a subset of relationships. We can even imagine a superset of terms, where dynamic grouping classes are created using other ontologies.

I'm not sure what the best solution is. I can imagine for advanced cases we might want to bundle the entire application ontology used. But this would be overkill for the more basic scenario.

ADD REPLY
0
Entering edit mode

The basic scenario requires representing the subset of the terms. This is standard practice. I haven't seen any cases of only using a subset of the relationships and can think of only one case in which terms have been grouped together.

ADD REPLY
1
Entering edit mode
12.7 years ago
Allpowerde ★ 1.3k

I'm all for standards, but it probably goes as it always has: the (accidental) format of the most heavily used program is adopted as the standard. So why not contact the developers of these programs and get their opinion (and cooperation):

ADD COMMENT
0
Entering edit mode

What about DAVID?

ADD REPLY
0
Entering edit mode

Not to leave anyone out - DAVID and many others are listed here: http://www.geneontology.org/GO.tools.shtml#term_enrichment

(let us know if your favourite tool isn't there)

You're right about how bioinformatics standards typically evolve - hopefully we can be a little more proactive here.

We should absolutely contact the developers of these tools. I wanted to check first there wasn't some existing effort. I imagine the next step will be to take this discussion to an (open) google group or something similar.

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6