Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

Help understanding MACS2 --extsize and --shift
2
3
Entering edit mode
4.9 years ago

So I'm trying to understand the --shift and --extsize parameters in MACS2. I have inherited an ATAC-seq that used the following MACS2 command.

macs2 callpeak -t ATAC_sample-1.bam --nomodel --shift -100 --extsize 200 -g 1.5e9 -f BAM


I'm trying to see how I can apply these principles to a Rscript I have that converts my raw BAM file to a coverage file? The full script can be looked at here but the main lines I'm looking at are:

aln <- as(aln, "GRanges") # Converts each read mapping to GRanges coord

aln <- resize(aln, 150) # Extended the reads to the fragment length (4 previous exp)

cov <- coverage(aln) # Get Coverages nucleotide


The definitions for extsize and shift can be seen in the documentation but I'll copy it for reference.

--extsize

While '--nomodel' is set, MACS uses this parameter to extend reads in 5'->3' direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when --nomodel is set or when MACS fails to build model and --fix-bimodal is on.

--shift

Note, this is NOT the legacy --shiftsize option which is replaced by --extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When --nomodel is set, MACS will use this value to move cutting ends (5') then apply --extsize from 5' to 3' direction to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction, otherwise 5'->3' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with --extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE or BEDPE for paired-end data. Default is 0.

Here are some examples for combining --shift and --extsize:

EXAMPLE 1: To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.

So to summarize my question is how should I resize my raw reads so that the coverage will reflect the --shift and --extsize used by MACS2? From reading the documentation it seems I should shift the reads 100 bp in both directions because it is extending the reads 200 bp 5' -> 3' (--extsze) and back 100 bp with (--shift).

ATAC-seq MACS2 Peak Calling • 9.1k views
6
Entering edit mode
4.9 years ago

You'll need to endoapply() a function that does the shift() according to the strand, since that's otherwise ignored. Then resize(aln, 200). Alternatively, use idx = which(strand(aln) == '+') for each strand and directly adjust the start accordingly (then resize()).

2
Entering edit mode

I appreciate your help! The function it self seems easy. It's just herd for me to understand biologically how to write the function (if that makes sense). Based on what you are saying, are you saying that you --shift -100 means shifting read -100 from the start spot?

so lets say I have ranges

Chr1 4300 4352 +

Chr2 9720 9820 -

--shift -100 turns this into:

Chr1 4200 4252 +

Chr2 9820 9920 -

Is that correct?

Than --extsize 200 takes the output of that and extends the fragment size to 200

Chr 4200 4400 +

Chr 9720 9920 -

Is that right??

1
Entering edit mode

What you showed is how I understand it happening as well :)

0
Entering edit mode
4.0 years ago
xd_d ▴ 110

Is the --extsize equal the window size ?

0
Entering edit mode

No it's the extension size that it will extend the reads to for smoothing the pileup