Question

Is annotation of enhancers to nearest gene pointless?

14

Entering edit mode

6.9 years ago

BioinfGuru ★ 1.7k

Hi Everyone,

I'd like to get a discussion going on the annotation of enhancers - not on the packages or tools, but on (my) weak assumptions of the underlying biological theory.

I always assumed that enhancers regulate the gene nearest them - then I found out that is not ALWAYS the case. But now I'm wondering if that is actually a pretty rare occurrence and in fact enhancers tend to NOT regulate the gene nearest to them at all.

Both the theory of DNA looping to explain enhancer/promoter interactions, and the development of chromosome conformation capture (3C) techniques has called into question the annotation of enhancers to their nearest gene. According to this paper only 7% (corrected below by pld) of looping interactions occur between adjacent genes. If looping interactions occur 93% of the time across distances greater than that between 2 genes, then is it simply WRONG to annotate enhancers with their nearest gene (implicating it as the target)?

(Tool suggestions are welcome, but I don't want the post hijacked from discussing principles of annotation to which is the best tool)

Thank you all.

enhancer annotation • 5.8k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.9 years ago by BioinfGuru ★ 1.7k

3

Entering edit mode

I think the paper says that 47% of elements impact the nearest TSS. The 7% is when considering looping interactions only.

Next we explored whether the relative order of elements in the genome affects which long-range interactions occur. It is often assumed that distal elements such as enhancers target the nearest TSS. We find that only ~7% of the looping interactions are between an element and the nearest TSS (Figure 3b). This number goes up to 22% when only active TSSs are included. Similarly, 27% of the distal elements have an interaction with the nearest TSS, and 47% of elements have interactions with the nearest expressed TSS. Thus, when predicting TSS-distal element interactions, choosing the nearest (active) gene is often not correct.

This could be the result of inertia, pipelines were designed like this and they're slow to change. Still, with 47%, if I had a new enhancer the first place I'd look would be the nearest expressed TSS.

ADD REPLY • link 6.9 years ago by pld 5.1k

0

Entering edit mode

Thank you for the correction pld :)

ADD REPLY • link 6.9 years ago by BioinfGuru ★ 1.7k

1

Entering edit mode

I'm not sure about the biology behind this, but is it necessarily the case that an enhancer regulates just one gene?

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Another weak assumption of mine ... thank you :o

I suppose that would depend on the TF that binds.... whether the TF (or anything it recruits) promotes transcription of single/multiple genes. But a second point is made, questioning the annotation of the nearest gene only.

ADD REPLY • link 6.9 years ago by BioinfGuru ★ 1.7k

score 11 · Answer 1 · 2017-05-22

11

Entering edit mode

6.9 years ago

i.sudbery 19k

Yes, its wrong to just blindly annotate enhancers to their nearest gene. The number of enhancers that regulate their nearest gene varies from study to study. The number I have in my head is 65% of enhancers do not regulate the closest gene (although I can't right now find the reference). There is some suggestion though that this number is higher if you only look at active genes. That's said, I think the the closest gene is probably the maximum likelihood estimate for the regulated gene. 35% of enhancers regulate the closet gene, leaving 65% to be distributed to the rest of the genome. Distance will clearly play a part, but I would be surprised if the 2nd closest gene was more likely to be regulated than the closest, so while the probably of ANY other gene being regulated by the enhancer is higher than the probability of it being the closest, the probability of it being a PARTICULAR other gene is probably less than it being the closet.

i.e. if A, B, C and D are genes then

P(A) < P(B or C or D)

but

P(A) > P(B) P(A) > P(C) P(A) > P(D)

It is definitely true that enhancers can and do contact multiple genes, usually only within a TAD. One situation where you'd definately not want to assign the closest TSS is if that TSS happened to be in a different TAD.

ADD COMMENT • link 6.9 years ago by i.sudbery 19k

3

Entering edit mode

To add some more complexity, an enhancer might regulate its nearest gene, even though it (the gene) is not active. Enhancers can be poised in a given cellular setting, being marked by histone marks such as H3K4me1, without being active at the moment of the assay. Therefore, it also depends on which identification method was used to call enhancers. As far as I know there is no gold standard by now, so assigning "active" enhancers to the nearest genes or to all genes within a certain window within a TAD probably is the most reasonable thing to do.

ADD REPLY • link 6.9 years ago by ATpoint 81k

3

Entering edit mode

Gold standard for active enhancers would be: H3K27Ac, H3K4me1, H3K4me2, chromatin accessible, eRNAs, and Pol II. Most studies won't use all these marks, but from what I understand this is considered to be optimal.

ADD REPLY • link 6.9 years ago by Sinji ★ 3.2k

4

Entering edit mode

Gold standard for enhancers would be to clone into a reporter vector, show expression of reporter in an identical pattern to the wildtype candidate target, show that cloned fragment is the minimal piece that exhibits this pattern and show that this activity is independent of distance from, and orientation with respect to, the promoter. Even then you only learn that the fragment is capable of driving that pattern.

In the end there is no substitute for a knock-out or mutation study.

ADD REPLY • link 6.9 years ago by i.sudbery 19k

0

Entering edit mode

I think tiling screens such as this paper are quite an interesting way to find functional elements.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

1

Entering edit mode

Recently we have done some correlation analysis between enhancer signal (from H3K27ac ChIP-seq) and Gene expression (from RNA-seq), both found within TAD regions. We observed (sometimes) high correlation between enhancer and a gene which is very far away than the gene which is closest. There are many layers of complexities when it comes to enhancer-promoter interactions (like enhancer hijacking, mutations in TAD boundaries ..etc). It's not always the closest one.

ADD REPLY • link 6.9 years ago by venu 7.1k

0

Entering edit mode

Great reply thank you

So this leads on to asking how to approach enhancer annotation. I have not done a comprehensive search for tools that annotate but my impression is CRE annotation packages usually just annotate the nearest gene.

Surely ideally to have any great confidence requires some form of 3C AND open chromatin sequencing AND RNA-seq expression data AND possibly even proteomic data on the same tissue sample: all taken at the same time points. Clearly this would be ridiculously expensive.

We all know that commonly the bioinformatician is approached for the first time after data collection when it is too late to influence experimental design. So how do you approach enhancer annotation when all you have is open chromatin with no expression data?

You can identify enriched TF motifs (but not expression of the TFs themselves). Then from the enriched TF motifs, you could plot the site of super enhancers. Then using the TFs (predicted from the motifs) as edges, you could plot a regulatory map of each super-enhancer. But how reliable is this going to be without expression data? (Please correct me if I've overlooked an important step)

ADD REPLY • link 6.9 years ago by BioinfGuru ★ 1.7k

4

Entering edit mode

We all know that commonly the bioinformatician is approached for the first time after data collection when it is too late to influence experimental design.

This reminds me of this great quote (Ronald Fisher):

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

ha! I was looking for that :)

ADD REPLY • link 6.9 years ago by BioinfGuru ★ 1.7k

2

Entering edit mode

Basically, the answer is that you can't. You can't really assign enhancers to genes without C data and expression data. You can probably make some educated guesses with one or the other of those data sets, particularly if you have two conditions, but without either, you are pretty stuck.

Probably your best bet is to assign an enhancer to all genes within a TAD, but bearing in mind that probably it won't actually regulate all those genes. You could do this with either HiC data from your cell type or a different cell type (the general feeling is that TADs are quite conserved between cell types in non-disease states). Alternatively you could combine CTCF ChIP with motif considerations (a lot of research is point to CTCF sitting at TAD boundaries, but with certain motif orientation considerations). One thing we've done is to identify enchancer states that change between two conditions and identify genes within 100kb (or 1mb) that also change.

I'm not sure I follow your suggestion. Surely this way one would only build networks of enhancers, but still know nothing about the genes they might be regulating.

ADD REPLY • link 6.9 years ago by i.sudbery 19k

0

Entering edit mode

Thanks i.sudbury

"I'm not sure I follow your suggestion."....If the TFs (edges), could link an enhancer (node1), with a gene (node2) that it is known to regulate from other data. yeah, questionable.

Time to get the books out on TADs. I do like the suggestion of pairing enhancer and gene state changes. I have nucleosome positioning/occupancy (nucleoATAC) so I could look at the NFR state changes at enhancers and see if they can be paired with peak changes in promoters within 100kb.

ADD REPLY • link 6.9 years ago by BioinfGuru ★ 1.7k

score 6 · Answer 2 · 2017-05-22

I'll also give a bit of information. I work with enhancers quite a lot, and looks like these little suckers will be the focal point of my PhD work as well.

It's pretty common knowledge by now that enhancers do not always regulate their most nearby gene, and it's also pretty established that not ALL enhancers activate via enhancer-promoter loops. A more trans-cis model has been adopted where some enhancers activate in trans, while others activate via cis-looping, for any number of misunderstood and unknown reasons.

Enhancers are also known to activate or at least interact with multiple genes, and this has been seen in quite a few studies. If you knock out a single enhancer, the expression of two or more genes that are known to interact with that enhancer are reduced. There's actually some really interesting ideas on why this occurs, and my main thought is that enhancers act more like liquid-droplets than solid binary structures.

So yes, assigning the enhancer to it's nearest gene isn't always correct. And the answer is, that this is still highly debated on the best way to identify accurate enhancer-promoter predictions. I don't know of a simple enhancer-promoter prediction tool that does this efficiently, but I know of several labs who are working on this (Lander lab at Broads, and McVicker at UCSD). Their approach is to systematically identify enhancers and their gene partners via wet lab techniques (CRISPR-Cas9 on one end, and another similar approach in the latter) and then build predictive models using a variety of sequencing datasets and hope something clicks.

I've built a small predictive model that uses Hi-C, GRO-seq, ChIP-seq of a couple of different TF's and histone modifications, and DNase-seq to predict enhancer-promoter interactions. Unfortunately, I'm not allowed to do much wet work at my current job, so it's hard to verify, but you may need to do something similar to this if you're really striving for semi-accurate annotation.

EDIT: Definitely not pointless to annotate enhancers to their nearest genes, pathway analysis is broad enough, and enhancers tend to interact with their most nearest gene frequently enough that you'll still be able to gain some useful information. Anything you do should be verified experimentally.