Question

MACS2 - centering peaks

0

Entering edit mode

6.9 years ago

BioinfGuru ★ 1.7k

Hello all.

I have been given vague instructions by my supervisor to check our ChIP-seq analysis pipeline on an example data set to see if we should be "re-centering" peaks. Unfortunately, I'm struggling to find consensus guidelines. So I'd like to get some opinions and guidance here form more experienced hands.

MACS2 is used to call peaks on the sample ChIP-seq data set with a fragment length of 217. Please see the the Peak Model and Cross Correlation. In the model you can see the characteristic bimodal distribution of reads on the forward and reverse strands. Am I assuming correctly that centering means to adjust the fragment positions to make these 2 spikes overlap? Also, I am not sure how to interpret that Cross-Correlation image.

What I have found so far:

Tutorial: Use half the fragment length as centering distance for jointly analyzing 5’ and 3’ tags. Centering means shifting the positions of tags mapping to the + or − strand of the chromosome by a fixed distance downstream and or upstream, respectively. Centering increases the resolution of the ChIP-Seq data.
MACS2 shift option: When NOMODEL is set, MACS will use this value to move cutting ends (5') towards 5'->3' direction then apply EXTSIZE (fragment length) to extend them.... recommended to keep it as default 0 for ChIP-Seq datasets.

Does anyone have any papers that does this re-centring? Is there any real need to do this?

EDIT: Thanks to vchris. I'm reading that paper now. Already I can see that:

the true binding site is between the bimodal distribution peaks
MACS2 centering of peaks is done using shift and extsize options

So to center the peaks with a fragment length of 217 I think I must do 1 of the following

--shift 108 --extsize 108
--shift 108 --extsize 217
--shift -108 --extsize 217 (basic intuition is leading me toward this choice)

Any advice?

Thank you all, Kenneth

MACS2 ChIP-Seq peaks centering • 3.7k views

ADD COMMENT • link 6.9 years ago by BioinfGuru ★ 1.7k

1

Entering edit mode

To my knowledge, you can always use nomodel and use the exact fragment length to make MACS2 call peaks. When you are estimating the fragment length using SPP which is actually doing the cross-correlation analysis to infer the burden of phantom peaks and the estimation of ChIP quality based on phantom peaks and more appropriate fragment length. Then you can use that for calling the peaks, making MACS2 not using its own model but supplying your own information. Tbh the accuracy of MACS2 building its own model and estimating the fragment length is not over-estimating or high unless the ChIP is not done in a proper way or the quality is not compromised. I have always reasoned this and usually, there is never a clear consensus where to use nomodel or use MACS2 modeling. I came up with the idea for our data to do cross-correlation analysis and CHANCE analysis to understand the quality of our data and if the quality is too low and also the data is noisy, I try to use the fragment lengths obtained by SPP since I also try to take the information from our wet-lab people as to what was the range of fragment length they used and that really clarifies a lot of doubt since if your estimated fragment length from SPP or MACS2 is within the range then you can use anything since the high confidence peaks should be fine for peak calling with our without the model. Also, depends on the number of peaks you are obtaining using MACS2 model or nomodel based on data quality. However, if you are talking about re-centering paper then this paper might be insightful.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k