Biostar Beta. Not for public use.
How can I compute if the co-occurrence of two TFBSs within a certain width in all promoters is larger than by chance?
0
Entering edit mode
16 months ago
JJ • 430

Hi,

I have computed the number of co-occurrences of two TFBSs in all promoters in the human genome.

Previously, we have discussed how to calculate if the co-occurring of two TFBSs is higher than one would expect by chance, which can be done with a hypergeometric distribution using the principle of overlapping lists.

But now I am wondering how I can compute if the co-occurrence of two TFBSs within a certain width in all promoters (or even the whole genome tiled in bins) is higher than chance? - so let's say within 100 bp of one another in all promoters (or even genome). So these are then a subset of the co-occuring ones.

I would reckon that this is more informative than just evaluating the co-occurence in general, as TFBSs close to each other might indicate that the TFs that bind to them are more likely to act synergistically. Any ideas how to handle this statistically?

My first thought was randomising - so downloading all TFs matrices and computing the co-occurance and co-occurance within a certain width for a number of random combinations of two TFs. Could I be on the right track here? Could I then do multiple Fisher Exact tests? something like this?

my TFs
# co-occuring in promoters not within length l    
# co-occuring in promoters within length l  
random Combinations
# co-occuring in promoters not within length l    
# co-occuring in promoters within length l

and then pool the p values somehow? Or is there an easier solution? I am grateful for any input!!!

Thanks,

sequence genome • 503 views
ADD COMMENTlink
1
Entering edit mode

So these are then a subset of the co-occuring ones.

If you have a set and a subset, you can think about the set as "background" and the subset as what you are interested in or are observing, and you could apply the hypergeometric or Fisher's Exact test to those two sets.

ADD REPLYlink
0
Entering edit mode

Thanks for your input. So how do I go about the multiple sets of random combinations of TFs? I have the set I am interested in and n number of random combinations of TFs - for each combination I have a set and subset. Do I average the values for set and subset beforehand and then do one Fisher Exact test or do I do multiple tests and then average the p value?

ADD REPLYlink
1
Entering edit mode
13 months ago
WCIP | Glasgow | UK

I'm not sure I fully understand your question but maybe GAT and/or bedtools reldist could do the job...

ADD COMMENTlink
0
Entering edit mode

Thank you for the links to those tools! They provide a different way of approaching the problem. However, if I have evaluated only the promoter regions for TFBSs wouldn't they automatically produce a positive result since all hits are at least as close as the length of the promoter region I have evaluated? Would they only work on a genome-wide scale?

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1