Question

Has Enhancer And Transcription Factor Binding Site Prediction Already Been Made Redundant?

9

Entering edit mode

14.2 years ago

Allpowerde ★ 1.3k

ENCODE soon provides DNase I hypersensitivity data for the whole genome in a multitude of different tissues. DNase I hypersensitivity marks genomic positions that are exposed and can hence be used to pinpoint active promoters or enhancers in the studied tissue. DNase I resistant regions, in contrast, mark genomic areas that are protected, e.g. because a transcription factor (TF) is bound. Since the data provides a base-pair resolution, it is possible to "zoom" in on the protected areas (== transcription factor binding sites) of the otherwise exposed regions (== enhancers). One can hence identify the shadow-prints on the genome left by the regulatory TFs in a given tissue. To identify which TFs are casting the shadows one could use ChIP-seq (rough binding regions) or Protein Binding Arrays (binding motif).

The question is: has the in-silico prediction of enhancers, binding sites or partners still merit or will we be soon able to look-up the binding events of TFs in the different tissues?

transcription-factor-binding-site • 7.8k views

ADD COMMENT • link updated 6 months ago by Ram 43k • written 14.2 years ago by Allpowerde ★ 1.3k

Ram · Answer 1 · 2010-03-07

I don't work on prediction of transcription factor binding or enhancers so I will just give a very general answer that could apply to any sort of prediction.

I think there is a big difference between observing an event (ex. transcription factor binding to region X) and knowing why you observe it. To put it in another way .. if we can solve protein structures should we still try to predict how a protein might fold ? Prediction tries to encapsulates our knowledge of the system so I think the answer is that we will never stop trying to predict/model a system even if we can just easily measure it. Until we can model it we don't really know how it works. If you are only interested in knowing where a TF might bind to then the observations are enough but if you want to know why a protein with those characteristics is binding to that DNA region then the observations are just the starting point.

Ram · Answer 2 · 2010-03-06

I wouldn't venture to hypothesize on what will happen; making predictions is difficult, especially about the future.

What seems prudent to assume however is that there may be several reasons of why genomic regions are accessible or protected. For example: is it a transcription factor that protects the region or is there some other reason of why the TF will bind to that location to begin with. For example chromatin structure and nucleosome positioning may favor or disfavor certain events.

Any in-silico modeling will need to take into account the various mechanisms that may take place.

Ram · Answer 3 · 2010-03-07

I agree with Istvan and pedrobeltrao.

I would just add that we should all be careful when we look at the type of experiments that you describe as well as protein structures and many other biochemical experiments.

They are, most often, snapshots of what is happening in an extremely dynamic environment, which is the cell and its components.

What holds true at one moment (when the cell was fixed, the proteins extracted or crystalized) is not the whole picture of what's happening or how things look.

I think you'll need more than a few biochemical experiments to know how the cell really works. Until that day, predictions and modeling will always be useful.

Ram · Answer 4 · 2010-03-09

7

Entering edit mode

14.1 years ago

Phis ★ 1.1k

In addition to what's already been said, I'd like to add that even if these data did completely abolish the need to predict TF binding sites (which I'm not entirely sure about), there are still many cases - and many organisms - where such data aren't available, implying that there's still a niche for computational approaches.

ADD COMMENT • link updated 6 months ago by Ram 43k • written 14.1 years ago by Phis ★ 1.1k

1

Entering edit mode

I was going to say that as much as I appreciate and use ENCODE data-- that the less-than-one-handful of species covered by this and by modENCODE leaves a lot of ground to cover....

ADD REPLY • link 13.3 years ago by Mary 11k

Ram · Answer 5 · 2010-03-09

We still have a long way to go when it comes to enhancer discovery. The fact that a genomic region comes out as DNAse I hypersensitive in a certain tissue does not necessarily mean it is an enhancer region in that tissue. Here, I think the DNAse I hypersensitivity (and FAIRE) data should be regarded as a necessary input to improved enhancer prediction algorithms, rather than something to replace them. (In fact I think there are very few enhancer prediction algorithms out there, so the help is sorely needed!)

Similarly, I don't think DNAse or FAIRE in themselves say much about transcription factor binding, although they can be very informative in combination with knowledge of the TF motif (or so I've heard). ChIP-seq, on the other hand, does give pretty solid information on TF binding which I agree would more or less supersede computational predictions in the relevant tissue in the given organism. As others have pointed out in this thread, though, there are many organisms and/or tissues for which we won't have ChIP-seq within a foreseeable time, and for those cases (and others) we can hopefully use existing ChIP-seq data to refine computational models of TF binding. So I would regard ChIP-seq data as something that helps us refine our understanding of TF binding, including the prediction of binding events in various systems.

Ram · Answer 6 · 2011-01-13

2

Entering edit mode

13.3 years ago

Larry_Parnell 16k

Excellent replies above. I'd like to throw some data out there. Mike Snyder (Stanford) of ENCODE has said in his talks that ~20-25% of RNA Pol II sites and ~7% of NfKB binding sites show variable binding between any two humans. As one who works on genetic variation, I know there is not that much variation between two human genomes at those specific sites. Why the differential binding? We don't know yet, but this makes it all the more important to keep both the lab and in silico arms of TFBS work ongoing.

ADD COMMENT • link 13.3 years ago by Larry_Parnell 16k

1

Entering edit mode

http://www.ncbi.nlm.nih.gov/pubmed/20299548?dopt=Abstract

ADD REPLY • link updated 6 months ago by Ram 43k • written 13.1 years ago by User 3484 ▴ 30

0

Entering edit mode

Larry, you mentioned Mike Snyder's talk.Has the results been published? If yes, a link to the paper will be favorable.

ADD REPLY • link 13.3 years ago by Dejian ★ 1.3k

0

Entering edit mode

Larry, you mentioned Mike Snyder's talk.Has the result been published? If yes, a link to the paper will be favorable.

ADD REPLY • link 13.3 years ago by Dejian ★ 1.3k

0

Entering edit mode

I don't believe those details are published but also have not looked through all the ENCODE papers that came out recently. I could not link to all those papers...

ADD REPLY • link 13.3 years ago by Larry_Parnell 16k

Ram · Answer 7 · 2011-03-17

2

Entering edit mode

13.1 years ago

User 3484 ▴ 30

Two recent papers address this issue with DNase data:

ADD COMMENT • link updated 6 months ago by Ram 43k • written 13.1 years ago by User 3484 ▴ 30

score 1 · Answer 8 · 2011-01-13

1

Entering edit mode

13.3 years ago

Dejian ★ 1.3k

Prediction will always be useful and necessary, I think. Learning about TFBS or other biological facts will finally boost synthetic biology or bioengineering. Predicitng something first and validating it later helps impove our understanding of biology.

ADD COMMENT • link 13.3 years ago by Dejian ★ 1.3k