Question

Processing ATAC-seq data after peak calling

1

Entering edit mode

5.8 years ago

anais1396 ▴ 30

Hello everyone !!

I've started to work on ATAC-seq data and I would like to know how to process data after peak calling ?

I've two group to study, patients and healthy control, and after performing peak calling to see regions of open chromatin (that correspond to genes potentially expressed), I would like to look for the notably differences between the 2 groups in order to see what's wrong with patients. For instance, I would like to see where some genes are expressed in patients and not in control and vice versa.

Is there a simple way to do those analyses ? Or maybe mutiple ways ? maybe is there a similar analyse from an other NGS technique like ChIP-seq, MNase-seq, DNase-seq, etc... ? What are the tools or pipeline usually used for that ?

Thank you in advance !!

Anaïs

peak calling sequencing • 4.0k views

ADD COMMENT • link updated 5.7 years ago by phosphodiester_bond ▴ 40 • written 5.8 years ago by anais1396 ▴ 30

score 2 · Answer 1 · 2018-06-23

You could look for differentially accessible regions between your treatment and control groups. The DiffBind or csaw packages can help you with this analysis. Once you have the differentially accessible regions you can start looking at genes they are close to or overlap. The GenomicFeatures and GenomicRanges and packages can be used to get a list of gene locations and look for overlaps. You can then try and profile this list of genes using something like a Gene Ontology or Gene Set Enrichment analysis. The goseq package is a good option to perform this analysis.

score 1 · Answer 2 · 2018-07-09

Hi anais1396, What I am doing right now (and I am also quite new to the subject) is the following:

from the reads1 and reads2, I align to a reference genome using bowtie2
conversion from sam to bam
using MACS2 -f BAMPE (for pair-ends ) --broad
annotation using HOMER annotatePeaks.pl and findGO.pl)

This should give you already a list of genes that are concerned by the open chromatin, according to the closest TSS.

Just leaving that here, because I am quite sure some may want to add more steps to that.

score 1 · Answer 3 · 2018-08-22

1

Entering edit mode

5.7 years ago

phosphodiester_bond ▴ 40

Hi Anaïs,

If you'd like to analyze differences in transcription factor activity between the two groups, we developed a tool to do this using your called peaks from ATAC-seq:

https://biof-git.colorado.edu/dowelllab/DAStk

More info here:

http://www.mdpi.com/1420-3049/23/5/1136

Good luck!

ADD COMMENT • link 5.7 years ago by phosphodiester_bond ▴ 40

0

Entering edit mode

Hi phosphodiester_bond,

sounds promising. Good to see that ATAC-seq tools are continuously being published. Can you comment on how your approach is different to the existing chromVAR approach from the Greenleaf lab?

ADD REPLY • link 5.7 years ago by ATpoint 81k

0

Entering edit mode

Hi!

I would need to look at the documentation carefully, but at first pass it seems these are two different tools and chromVAR is more focused on comparing the ATAC-seq signal itself between experiments, rather than the estimated levels of TF activity between the two datasets. It looks like it provides functions to figure out what motifs are overlapped by a particular peak (fixed to Jaspar, while DAStk can be used with scanned motif sites from any sources), but not the comparison experimet-wise of what are the most significant changes in TF activity. Thanks for pointing this out, though, because I haven't heard of it and may be handy in other scenarios!

(sorry about the late response, I need to setup notifications)

ADD REPLY • link 5.6 years ago by phosphodiester_bond ▴ 40