You could use a tool like BEDOPS bedmap
with its --mean
, --median
, --max
, --min
, --stdev
and other statistical operators to calculate a statistic over categories of regions.
Depending on the characteristics of your signal, you might ask how likely it is to get, for instance, a median score above or below a certain value from sampling random regions over a genome — your expected median signal, say — comparing that signal against what you find over regions of interest — your observed median signal.
Essentially you would write a "sampler" program that generates a UCSC-formatted BED file that contains randomly-sampled regions from your sensibly-chosen background (you might avoid sampling from repeat regions, for instance). Let's say this file is called background_regions.bed
.
You also put your regions-of-interest into a second UCSC-formatted BED file, say regions_of_interest.bed
.
Make sure these two files are sorted:
$ sort-bed unsorted_background_regions.bed > background_regions.bed
$ sort-bed unsorted_regions_of_interest.bed > regions_of_interest.bed
You can then use bedmap
to calculate statistics over each of these two reference BED files. Use BEDOPS wig2bed
and BEDOPS sort-bed
to convert the Wiggle-formatted signal file into sorted BED data, piping this stream into the bedmap
statement as the map data.
For example, to get expected and observed medians, pipe sorted signal into bedmap
and use the --median
operator over the two BED files:
$ wig2bed mySignal.wig \
| sort-bed - \
| bedmap --median background_regions.bed - \
> expectedMedians.txt
$ wig2bed mySignal.wig \
| sort-bed - \
| bedmap --median regions_of_interest.bed - \
> observedMedians.txt
(You could use other operators alone or in combination to calculate a statistic or score of your choice.)
From these expected and observed results you should be able to calculate a z-score and a p-value for that class of regions-of-interest. If the p-value meets some threshold, then you could argue that your regions-of-interest are suggested to be significant.
I just love how transparent and clear is bedOps suite :)