Question

TFBS from UCSC vs. other databases

0

Entering edit mode

7.8 years ago

hannafrida.klein ▴ 40

Dear Biostars Community,

I just started working with TFs and I already have my first question for which I do not find a satisfying answer. I have some genes, for which I would like to get their TFBSs. My first idea was to go to the UCSC website, and from the table browser when I choose: Mammal, Human, hg19 and Regulation (for group) then I will get what I want. But then I found more options http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=502642519_lKex7xjVm0YB4DD1qYQqwapAKDYJ&c=chr1&g=wgEncodeTfBindingSuper

and each contains multiple tables. As far as I understand it would be enough to take the UNIFORM TFBS (which contains all others like HAIB for example). Right?

But what is the table TFBS cons sites? Are all the conserved TFBSs? Are these BSs also included in the UNIFORM track? I can remember, that it states somewhere, that not all binding sites listed in this table are biologically functional binding sites. Then how do I know which TFBS are predicted and which were experimentally confirmed?

Then I found that there are also some interesting databases like JASPAR; FRANSFAC; HOCOMOCO; hDPI, UniProbe. Do these databases contain more information than UCSC?

I am really confused now... and don't know how to start.

I am thankful for all your help, papers to read, suggestions etc. To my excuse that I am not a biologist.... but I really want to understand this.

Sorry if the questions are basic for some of you.

Kindly, Frida

TFBS UCSC TRANSFAC • 2.5k views

ADD COMMENT • link updated 7.8 years ago by Emily 23k • written 7.8 years ago by hannafrida.klein ▴ 40

0

Entering edit mode

And what confused me even more. If I look at one of the HAIB TFBS tables for example, than I have records like this:

#bin    chrom   chromStart  chromEnd    name    score   strand  signalValue pValue  qValue
590 chr1    713818  714398  peak1   169 .   547.13  -1  -1

To my knowledge TFBSs are only around 10 nt long. Why is then the chromEnd-chromStart so much longer?

Thank you again!

ADD REPLY • link 7.8 years ago by hannafrida.klein ▴ 40

score 2 · Answer 1 · 2016-07-12

2

Entering edit mode

7.8 years ago

Emily 23k

The sites you can see are from ENCODE. There are experimentally confirmed sites from ChIP-seq experiments. They were identified in a particular cell type from a single individual. They will differ between cell types and may also differ between individuals.

Not sure about all the databases you mention, but I know about Jaspar. Jaspar has motifs, not binding sites. These are short sequences of DNA that are known to attract transcription factor binding. You can use the motifs to predict where in the genome transcription factors might bind, however there is not necessarily experimental evidence that the transcription factor ever does.

Because of the random cutting step in ChIP-seq (fragments around 200bp), binding sites will be much larger than motifs (even though the region occupied by the transcription factor may not be much bigger than the motif).

ADD COMMENT • link 7.8 years ago by Emily 23k

0

Entering edit mode

Thank you Emily!

These are short sequences of DNA that are known to attract transcription factor binding. You can use the motifs to predict where in the genome transcription factors might bind, however there is not necessarily experimental evidence that the transcription factor ever does.

Of course that just searching for a motif in the whole genome does not mean at all, that the TF also binds there. But if these are experimentally confirmed sites, than if I have a list with known motifs I could just map the two lists, and find where "exactly" a TF binds. Am I right?

Then I know that there is a many-to-many relation between the TFs and the TFBS, such that many sites can interact with several factors, and all known factors bind to more than just one site. Is there a way to get this information?

ADD REPLY • link 7.8 years ago by hannafrida.klein ▴ 40

0

Entering edit mode

Finding experimentally confirmed sites would involve cross-referencing motifs with all publicly available ChIP-seq data. Not sure if that data is out there or not. Someone else might know.

It seems unlikely to be that any transcription factors only have one binding site in the whole genome. Most transcription occurs in response to combinations of transcription factors, so very specific regulation will involve a promoter with many different motifs nearby, not likely to be one transcription factor that only binds to that promoter and nowhere else. It would seem evolutionarily inefficient to have that, as why have a transcription factor middle-man, when whatever activated the transcription of the transcription factor could just directly activate the downstream gene. Happy for someone to give an example the proves me wrong though.

ADD REPLY • link 7.8 years ago by Emily 23k