Question

MACS2 Optimal parameter identification

0

Entering edit mode

8.2 years ago

onspotproductions ▴ 150

For calling peaks in MACS2 are there any tools that can provide some information on the parameters. I know the parameters are mostly optional, but would like to get the cleanest output as possible.

ChIP-Seq clip-seq RNA-Seq peak calling • 5.6k views

ADD COMMENT • link updated 7.5 years ago by Biostar 20 • written 8.2 years ago by onspotproductions ▴ 150

score 1 · Answer 1 · 2016-03-01

In that case you have to go for hit and trial so run the MACS2 for with different parameter options. First of all your question is not clear. Is it TF or chromatin marks that is done for the ChIP. Then depending upon that you need to set the parameters. You can always create the tracks of your ChIP-Seq data to see how they look once you charge them in the UCSC browser. Then according with different parameters you can call the peaks. A note of caution. MACS2 has both usage of macs model and --nomodel option. You can run your data using default parameters with both MACS2 model or --nomodel option to see the number of peaks that are enriched and then try to plot the enrichment score against the signigicant p value or qvalue. I believe you need to go through the manual a bit more clearly. And by tool you mean manual, if you are using MACS2 then you would need its manual to know how the different parameters work not any other tool. You can take a look at this blog. Quite useful and detailed but first read the manual to get an understanding of what the parameters are and how the output is arranged. Also before running the peak calling take a look at the quality of your data with FASTQC report and CHANCE to analyze the quality of your data.

score 1 · Answer 2 · 2016-03-01

1

Entering edit mode

8.2 years ago

dariober 14k

I guess the main stumbling block is in how you define "cleanest output". If you have some metrics to quantify it or you have a training set of true positives and true negatives, then it shouldn't be too difficult to automate the process of finding the best parameters (with some heuristics, of course).

For some ChIP-Seq experiments where you know that peaks should span certain motifs or be in certain genomic regions (e.g. promoters) than you would search for the parameters that maximize these metrics. But in practice (at least for me) it's hard to tell what a good peak set looks like.

ADD COMMENT • link 8.2 years ago by dariober 14k

0

Entering edit mode

Infact I that is the reason I was actually saying to generate the tracks and if it is TFs and one has clear idea of some of the targets then obviously check for motif spanning or be in certain genomic positions. The OP needs to be specific regarding the design of the experiment and be clear of what the ChIP is about. Then I guess OP can proceed. Infact for cell lines for promoters and enhancers the ENCODE data set can serve as the TPs list and running with different parameters the OP can intersect the peak files of the output with that of the ENCODE corresponding to the ENCODE peak files. Otherwise it is difficult.

ADD REPLY • link 8.2 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

I should have been more specific. I am starting with Clip-seq data from ENCODE; this data contains two replicates and one control. I am looking to really get an understanding of what is the best way to call the peaks. I don't have much experience in peak calling and I believe macs is the best tool to use, but have also looked at piranha. Because this data is not mine I want to make sure when I analyze it the results are viable.

ADD REPLY • link 8.2 years ago by onspotproductions ▴ 150

0

Entering edit mode

Did you download the ENCODE data for any transcription factor (TF) chip or are they chromatin marks? For chormatin few marks are associated with broad peak profiles and in that case MACS2 should be used with --broad flag and --broad-cutoff while for the other chromatin marks the normal call is fine. But in any case the ENCODE data should have defined sets of promoters for their samples as well. You can call your peaks with any tools and then overlap them with the ENCODE peaks for that sample.

ADD REPLY • link 8.2 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

The data is for a transcription factor chip. Like I said I do not have much experience with this type of analysis yet, so I am learning.

ADD REPLY • link 8.2 years ago by onspotproductions ▴ 150

0

Entering edit mode

If you have installed the latest version of MACS2 properly then you can follow this manual, and do the regular peak calling since it is of TF. So in that case run each each replicate against the control to obtain the peaks and then compare the peak file across 2 replicate to get conserved regions and downstream exploratory analysis.

ADD REPLY • link 8.2 years ago by ivivek_ngs ★ 5.2k