I'm trying to see how I can apply these principles to a Rscript I have that converts my raw BAM file to a coverage file? The full script can be looked at here but the main lines I'm looking at are:
aln <- as(aln, "GRanges") # Converts each read mapping to GRanges coord
aln <- resize(aln, 150) # Extended the reads to the fragment length (4 previous exp)
cov <- coverage(aln) # Get Coverages nucleotide
The definitions for extsize and shift can be seen in the documentation but I'll copy it for reference.
While '--nomodel' is set, MACS uses this parameter to extend reads in 5'->3' direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when --nomodel is set or when MACS fails to build model and --fix-bimodal is on.
Note, this is NOT the legacy --shiftsize option which is replaced by --extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When --nomodel is set, MACS will use this value to move cutting ends (5') then apply --extsize from 5' to 3' direction to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction, otherwise 5'->3' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with --extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE or BEDPE for paired-end data. Default is 0.
Here are some examples for combining --shift and --extsize:
EXAMPLE 1: To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.
So to summarize my question is how should I resize my raw reads so that the coverage will reflect the --shift and --extsize used by MACS2? From reading the documentation it seems I should shift the reads 100 bp in both directions because it is extending the reads 200 bp 5' -> 3' (--extsze) and back 100 bp with (--shift).
You'll need to endoapply() a function that does the shift() according to the strand, since that's otherwise ignored. Then resize(aln, 200). Alternatively, use idx = which(strand(aln) == '+') for each strand and directly adjust the start accordingly (then resize()).