Plot of intergenomic distances between all bound TF sites?
1
0
Entering edit mode
8.4 years ago
bede.portz ▴ 540

I would like to plot the distance between all pairs of peaks/bound locations for a specific transcription factor. In other words, generate a histogram of inter-genomic distances between all bound locations. Essentially a composite plot of the data, but with the bound location being both the reference point and the data being plotted.

My rationale is that for a particular factor, it appears that bound locations are very often clustered with other bound locations within a few Kb. Generating the plot I mentioned may reveal if there is some preferential range of distances between bound locations for this particular factor, which could be compared to the intergenomic distances of other related factors and TSSs, and to that of the estimated random distribution.

Is there a tool to do this?

ChIP-Seq peaks • 2.0k views
ADD COMMENT
0
Entering edit mode

It sounds like bedtools closest plus awk would work. Have you given that a try?

ADD REPLY
3
Entering edit mode
8.4 years ago

You could use BEDOPS closest-features --dist --closest --no-overlaps --no-ref on a sorted BED file of regions-of-interest ("roi"), feeding the resulting list of signed distances into R and hist() to generate a histogram. You will need to take the absolute value of values with abs(), to deal with negative values before plotting a histogram.

At the command-line:

$ closest-features --dist --closest --no-overlaps --no-ref roi.bed roi.bed \
    | cut -d '|' -f2 - \
    > signed_distances.txt

In R:

> v.signed <- scan("signed_distances.txt")
> v.unsigned <- abs(v.signed)
> hist(v.unsigned)

You could repeat this procedure on any set of regions-of-interest, such as those from other factors, or similarly-sized intervals sampled from a genomic background that makes sense for your experiment (e.g., the entire genome minus repeatmasked regions, etc.).

If you want to compare distributions of distances and assign statistical significance to the comparison, you might use a K-S test (ks.test()) or chi-squared test (chisq.test()) on the unbinned distances.

ADD COMMENT
0
Entering edit mode

Alex, Thanks for the response. It appears from a cursory look at the documentation that closest features wants two input files, can I run it with just the one input file? I.e. the bound intervals for a given factor?

Thanks

ADD REPLY
1
Entering edit mode

Take a look at the example. You still use two inputs, but specify the same filename for both inputs. This makes the application look for the nearest distance between each pair of non-overlapping elements within your lone input file. Just make sure your BED-formatted input file is sorted per BEDOPS sort-bed before running closest-features.

ADD REPLY

Login before adding your answer.

Traffic: 1466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6