Get The Percentile From A Wig/Bed File
1
0
Entering edit mode
11.1 years ago
Eric Ho ▴ 10

Hello,

I have got a set of signal values from different regions in a BED/WIG file.

However, I cannot easily identify if the signal is significant across the file.
One of the solution is calculating a percentile of certain signal value.

Is there any tools can calculate the percentile of a certain signal in the file?
If no, how can I identify the significance of the signal across the file?

Thanks!

statistics wiggle bed • 3.3k views
ADD COMMENT
1
Entering edit mode
11.1 years ago

You could use a tool like BEDOPS bedmap with its --mean, --median, --max, --min, --stdev and other statistical operators to calculate a statistic over categories of regions.

Depending on the characteristics of your signal, you might ask how likely it is to get, for instance, a median score above or below a certain value from sampling random regions over a genome — your expected median signal, say — comparing that signal against what you find over regions of interest — your observed median signal.

Essentially you would write a "sampler" program that generates a UCSC-formatted BED file that contains randomly-sampled regions from your sensibly-chosen background (you might avoid sampling from repeat regions, for instance). Let's say this file is called background_regions.bed.

You also put your regions-of-interest into a second UCSC-formatted BED file, say regions_of_interest.bed.

Make sure these two files are sorted:

$ sort-bed unsorted_background_regions.bed > background_regions.bed

$ sort-bed unsorted_regions_of_interest.bed > regions_of_interest.bed

You can then use bedmap to calculate statistics over each of these two reference BED files. Use BEDOPS wig2bed and BEDOPS sort-bed to convert the Wiggle-formatted signal file into sorted BED data, piping this stream into the bedmap statement as the map data.

For example, to get expected and observed medians, pipe sorted signal into bedmap and use the --median operator over the two BED files:

$ wig2bed mySignal.wig \
    | sort-bed - \
    | bedmap --median background_regions.bed - \
    > expectedMedians.txt

$ wig2bed mySignal.wig \
    | sort-bed - \
    | bedmap --median regions_of_interest.bed - \
    > observedMedians.txt

(You could use other operators alone or in combination to calculate a statistic or score of your choice.)

From these expected and observed results you should be able to calculate a z-score and a p-value for that class of regions-of-interest. If the p-value meets some threshold, then you could argue that your regions-of-interest are suggested to be significant.

ADD COMMENT
0
Entering edit mode

I just love how transparent and clear is bedOps suite :)

ADD REPLY

Login before adding your answer.

Traffic: 1671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6