Question

UMI with MACS

0

Entering edit mode

5.0 years ago

Richard ▴ 590

Hi folks.

We have some ChIP data with UMIs in the reads. We are planning to duplicate mark using either umi_tools or Picard to mark duplicates in a UMI aware fashion.

I see in the docs that MACS will mark duplicates in its own way: https://github.com/taoliu/MACS/wiki/Advanced%3A-Call-peaks-using-MACS2-subcommands

Ideally we could keep all the sequenced reads in the BAM files and have MACS use the duplicate status as it was set by our duplicate marking tool.

Has anyone tried this sort of thing? Perhaps my understand of how MACS works is incorrect?

UMI MACS ChIP-Seq • 1.6k views

ADD COMMENT • link updated 5.0 years ago by i.sudbery 19k • written 5.0 years ago by Richard ▴ 590

0

Entering edit mode

Maybe this will work if I just use the "callpeak" function.

ADD REPLY • link 5.0 years ago by Richard ▴ 590

0

Entering edit mode

Probably not as even though callpeak is a separate command it still documents that MACS2 will make its own decision about which reads are duplicates.

ADD REPLY • link 5.0 years ago by Richard ▴ 590

0

Entering edit mode

I would produce BAM or BED files with only the reads that should be considered for peak calling and use --keep-dup=all to make MACS use only and exactly those reads.

ADD REPLY • link 5.0 years ago by ATpoint 82k

score 1 · Answer 1 · 2019-05-01

My understanding is that MACS doesn't mark-duplicates, but rather filters out reads that have been marked as duplicates by a seperate tool (although I may be wrong).

Further, I don't think that MACS has any way to deal with UMI deduplication - that is, if you had four reads that the same place on the genome, with two reads each for two separate UMIs, I'm entirely unclear what MACS would do with that, but its quite likely I think that it will filter out all but one of them.

Finally, UMI-tools unfortunately doesn't have an option to mark duplicates, only deduplicate, or mark each read as to which "group" it comes from.

Thus, I would recommend deduplicating your reads with umi_tools dedup and then feeding these into MACS with --keep-dup=all as recommended by @ATpoint.