I have computed the number of co-occurrences of two TFBSs in all promoters in the human genome.
Previously, we have discussed how to calculate if the co-occurring of two TFBSs is higher than one would expect by chance, which can be done with a hypergeometric distribution using the principle of overlapping lists.
But now I am wondering how I can compute if the co-occurrence of two TFBSs within a certain width in all promoters (or even the whole genome tiled in bins) is higher than chance? - so let's say within 100 bp of one another in all promoters (or even genome). So these are then a subset of the co-occuring ones.
I would reckon that this is more informative than just evaluating the co-occurence in general, as TFBSs close to each other might indicate that the TFs that bind to them are more likely to act synergistically. Any ideas how to handle this statistically?
My first thought was randomising - so downloading all TFs matrices and computing the co-occurance and co-occurance within a certain width for a number of random combinations of two TFs. Could I be on the right track here? Could I then do multiple Fisher Exact tests? something like this?
my TFs # co-occuring in promoters not within length l # co-occuring in promoters within length l random Combinations # co-occuring in promoters not within length l # co-occuring in promoters within length l
and then pool the p values somehow? Or is there an easier solution? I am grateful for any input!!!