Question

What is the best way to plot distribution of peak heights?

0

Entering edit mode

8.5 years ago

1888 ▴ 70

I am sure this is been dealt before by Chip Seq experts... I was wondering what methods, preferably in R, are available to plot the distribution of the heights of Chip Seq peaks mapped on the genome. I see that there are several packages such as ChIPseeker and deepTools that will plot either heat maps or frequencies weighted by the height of the peaks, I guess this is done because reads are not uniformly distributed across the genome but why not plot directly the mean distribution of heights normalised by the total number of peaks in a histogram for example? What are the issues here that I am not seeing?

Thank you very much for your insight

peaks annotation ngs R ChIP-Seq • 3.7k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by 1888 ▴ 70

Ram · Answer 1 · 2015-10-31

0

Entering edit mode

8.5 years ago

Devon Ryan 104k

Just use the hist() function. That's why we don't bother implementing something like this in deepTools.

ADD COMMENT • link 8.5 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your reply Devon! You don't mean to use hist() directly right? I am not trying to map the frequencies but just to control for it, what I am trying to get is a plot of the value (y-axis) with respect to distance from start site (x-axis). I must have to take account of the number of peaks otherwise the signal would be driven by only for example one peak in some regions and many in others, binning could work but it seems hard to find the correct number of bins since it varies a lot. Thanks

ADD REPLY • link 8.5 years ago by 1888 ▴ 70

0

Entering edit mode

With deepTools profile tool you can plot the standard error as well as the mean if this is what you want. But, this is a different question than plotting a histogram of peak coverage. Furthermore, peak height varies depending on genome biases as well as for biological reasons, thus, a normalized 'peak height' based on the 'input' sample is desirable. If you have, for example, the MACS2 peak output, you can easily in R call the hist() function as suggested by Devon on the fold change column.

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.5 years ago by Fidel ★ 2.0k

0

Entering edit mode

Hi Fidel, thank you for your reply! Yes what I would like is not a histogram of peak coverage, but it is the mean peak height normalized by the coverage. Maybe I can simply do this in R by binning the distance from transcription start site, finding the mean height for each bins, and plotting that as peak height corrected by the coverage. Is this wrong? How else would you see how the peak height varies depending on position on the genome (i.e. near or far from transcription start sites)? Thanks again for your input!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by 1888 ▴ 70

1

Entering edit mode

If I understand you correctly, what you can do is use closestBed from bedTools. This program will give you the distance between peak and TSS. Then, you can plot peak fold change (or some of the other normalized values from MACS2) vs. distance.

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.5 years ago by Fidel ★ 2.0k

0

Entering edit mode

Hi, thank you for the help, I really appreciate it as I am a bit lost here. I'm actually already using closestBed to map the peaks in regards to the TSS, but I now have many peaks piling up very close to the TSS and few further away, so my question is how do I normalize the output from MACS2 in regards to distance from TSS? I am asking this because when I tried binning this distance from TSS and taking the mean of the MACS2 values for each bin, I got a plot that just has one huge spike close to the TSS no matter how small the bins were (I have it down to 1 bp), so it seems like the wrong approach...

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by 1888 ▴ 70