Question

How to answer if the biological process is up or down-regulated?

4

Entering edit mode

5.2 years ago

aln ▴ 320

When doing enrichment analysis using public pathways databases (e.g. KEGG, Reactome) or ontologies (e.g. Gene Ontology, Human Phenotype Ontology) I often get categories, which contain both up and down-regulated genes according to the gene expression data. I interpret it as potential indicator that some process is "disrupted" or that my data might resemble particular phenotype. However, I noticed that people are rather interested in up- or down-regulated processes and they tend to equate it with enriched category having up- or down-regulated genes only. I think such understanding might not reflect how the biological processes work (as they can have both repressed and activated genes at the same time), so I seek answers for the following questions:

How do you interpret enriched categories having both up- and down-regulated genes? Do you perform enrichment analysis on the whole set of differentially expressed genes (I saw practices when people separate up- and down- genes)?
Do you know any bioinformatics tools or algorithms that go deeper in functional/ontology enrichment analysis and try to tell if the particular biological process is indeed (de)activated (by inferring the state of the final products of the pathway, for example).

Thanks in advance.

enrichment analysis pathways ontologies • 3.6k views

ADD COMMENT • link updated 5.2 years ago by i.sudbery 19k • written 5.2 years ago by aln ▴ 320

score 7 · Answer 1 · 2019-01-29

7

Entering edit mode

5.2 years ago

i.sudbery 19k

There is no real interpretation here. Some people, particularly those not used to thinking in terms of networks, might be tempted to say that if genes are up regulated, then the pathway is up-regulated, and if the genes are down regulated then the pathway is down regulated. But this is not generally valid. Of course one might infer that if most genes in a pathway are up-regulated then this pathway is more important in condition A than condition B, and vice versa.
SPIA is a package that tries to track the effects of the changes through the network, sorting out the positive regulators from the negative regulators etc. The idea is great, but last time I looked it was let down by the network annotation (which comes from an old version of KEGG). For example, in the network used, the WNT ligand is not annotated as an activator (or repressor) of the receptor.

More broadly one must also take on board that the transcript levels of proteins in the pathway is not the same as activity. Most pathways function via post-translational activation of their components. While a compoenet being unregulated means that the maximum activation of the component might be higher, it doesn't mean it is more active at any given time. For my money the best way to measure changes in the activity of a pathway is to measure the enrichment of up and down genes that are targets of the pathway, not members of the pathway. However, this is generally not widely available information, and might need to be specifically generated for your pathway of interest from integration of knockout and DNA-binding data.

ADD COMMENT • link 5.2 years ago by i.sudbery 19k

1

Entering edit mode

1.

There is no real interpretation here

I like this answer:D

Some people, particularly those not used to thinking in terms of networks, might be tempted to say that if genes are up regulated, then the pathway is up-regulated.

That's exactly my point, and another problem here is weak understanding how ontologies are composed. For phenotype based ontologies I can get list of genes for each category and list of references, from where the information was inferred. And in most general cases I don't have information on up and down regulation unless I go through each reference. But even then, let's say I got gene A up-regulated in 50 references and down-regulated in other 50. It could mean the gene is totally irrelevant for the given category and needs to be excluded from the phenotype, or it could make perfect sense if its expression depends on the experimental factors (as it's hard to imagine 100 perfectly identical experiments).

Of course one might infer that if most genes in a pathway are up-regulated then this pathway is more important in condition A than condition B, and vice versa.

I think this is a nice explanation, but sometimes I struggle to explain that it's not antagonistic to cases with "mixed" genes in the enriched category. I wrote this post to hear all different opinions, because it was hard for me to put this into words, and I want to have decent discussion next time such topic pop ups.

2.

SPIA is a package that tries to track the effects of the changes through the network, sorting out the positive regulators from the negative regulators etc.

Very interesting, thanks for suggesting, I will look into it. I wonder if it's possible to do the same for non-pathway based ontologies, e.g. by inferring up and down regulation from literature (basically the thing I wrote in 1 when answering your comments).

For my money the best way to measure changes in the activity of a pathway is to measure the enrichment of up and down genes that are targets of the pathway, not members of the pathway. However, this is generally not widely available information, and might need to be specifically generated for your pathway of interest from integration of knockout and DNA-binding data.

Super nice idea, not easy to implement though, and hard to get data unless produced specifically. But sounds like a nice scientific challenge.

ADD REPLY • link 5.2 years ago by aln ▴ 320

1

Entering edit mode

I struggle to explain that it's not antagonistic to cases with "mixed" genes in the enriched category.

So I think if you are seeing upregulation of, say, positive regulators and negative regulators of a pathway, then you might be seeing cells that are more sensitive to both activators and repressors of that pathway (remember, high expression of a positive regulator means nothing unlike that positive regulator is post-transcriptionally activated).

If you have both up regulation of activators and down regulators of repressors, then I guess you are making the pathway easier to activate and vice-versa you are making it harder to activate.

If you are seeing both up and down regulation of positive regulators (negative regulators) then you could be seeing a change in which signal the pathway will respond to, or a change change in the kinetics (how long will the effect of a signal last in the network for example).

It is unlikely that these sorts of effects can be reasoned about intuitively without modelling.

ADD REPLY • link 5.2 years ago by i.sudbery 19k

1

Entering edit mode

not easy to implement though, and hard to get data unless produced specifically.

Yep. Like, at least a Masters project if the data exists. A PhD if it doesn't.

Good science is hard, slow and there are no easy wins.

ADD REPLY • link 5.2 years ago by i.sudbery 19k

Jean-Karim Heriche · Answer 2 · 2019-01-27

6

Entering edit mode

5.2 years ago

Jean-Karim Heriche 27k

1- Enrichment analysis should be framed to get insight into a particular biological question and so whether you're interested in some or all genes depends on the context. Interpretation of the result also depends on this context.
2- Some people try to infer gene regulatory networks. However, as far as knowing if a particular process is affected, nothing beats doing an actual experiment.

ADD COMMENT • link 5.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Interpretation of the result also depends on this context

That's true. If I'm lucky I get pathway, which is tightly connected with studied biological question, so it won't be that difficult to say if up and down genes make sense at all. But sometimes I get 5-10 pathways, which I did not expect (or even heard of), so I was wondering what people do in such cases and how they interpret results, i.e. if they just go though each "enriched" gene in the pathway and check literature for the given context.

Some people try to infer gene regulatory networks

I do that too, but it works better only if one has substantial amount of samples. And I think in most cases it answers a bit more general question, e.g. if the expression of particular genes is connected somehow, which we kind of already have if we study pathways. Or you mean we can use defined pathway structure and simulate genes states? I guess it heavily depends on method, correlation based or pure information theory rather give "associations", and here we also need "sign" and "direction" of interaction, so probabilistic graphical models might be more suitable.

ADD REPLY • link updated 5.2 years ago by Jean-Karim Heriche 27k • written 5.2 years ago by aln ▴ 320

1

Entering edit mode

I meant also interpretation depends on what you're trying to get from the experiment. If you're trying to characterize a particular process then all pathways that show up are potentially of interest. If you're looking at the effect of a particular treatment on a well characterized process then some pathways can be identified as irrelevant. One of the issue with these kinds of approach is that they are dependent on previous knowledge and on the source/representation of this knowledge. Typically, different resources have different notions of what is a pathway or for a given pathway/process what the participants are. Also given that there are always a number of false positives, pathway analysis should be treated as hypothesis generator to help design more targeted experiments. You can also go further by modelling gene regulatory networks with boolean networks or Petri nets.

ADD REPLY • link 5.2 years ago by Jean-Karim Heriche 27k