Question

Does read counts for ATAC-seq differential detection make sense?

3

Entering edit mode

6.8 years ago

datascientist28 ▴ 560

So I've been following a pipeline recommended to me on this discussion section. (https://www.biostars.org/p/224440/#258171). A post doc in my lab brought up an argument about his tentativeness to use counts. I don't have a great answer for this. Can someone help us understand this better?

his 'problem':

"""" I guess it seems like the assumption in all of these would work better in more normal data sets, where the alignment is to a defined gene which has been discretely pulled down in ChIP-seq or ID'd in RNA-seq.

In our case, the entire genome is in play, and each read carries less weight on its own. I imagine a scenario in which local background is higher, with a small peak built on top of it. The count data would say that it is a much more important peak than it might be. Maybe this is okay, because high background could suggest generally more open chromatin, but I can imagine it being a real problem in a situation where one sample has many more reads, most of which are noise.

I'm also thinking toward a future experiment in older worms, where supposedly chromatin is much more accessible to begin with. If the foundation of reads is high, the significance of the read counts at peaks should be diminished, but count data does not allow for that as far as I can tell. """

ATAC-seq edgeR Differential Accessibility • 2.5k views

ADD COMMENT • link 6.8 years ago by datascientist28 ▴ 560

2

Entering edit mode

It's an interesting point, but I struggle to understand what 'background' in an ATAC-seq experiment represents. In ChIP-seq background is non-specific pull-down which generally correlates with open chromatin regions. In ATAC-seq you're specifically looking for signal in open chromatin regions. In that sense I can't see that ATAC-seq has a background at all. The only worry I've had in the past with ATAC-seq data is comparison between samples with drastically different chromatin states. For example embryonic stem cells are generally very open, compared to a somatic cell which generally is more closed. If you look at the same open region in both samples the embryonic stem cell tends to have lower signal because the reads are sampling more of the open genome. I've not seen a method yet which accounts for this so I'm skeptical about certain ATAC-seq comparisons.

ADD REPLY • link 6.7 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

The only worry I've had in the past with ATAC-seq data is comparison between samples with drastically different chromatin states.

@James Ashmore I currently running into the same situation as described above. Have you seed a reasonable solution to compare these type of samples? I have been suggested using other data, such as RNA-seq, to select regions whose expression doesn't change across all samples. Then calculate coverage for these regions for each sample, and normalize to that. On the face of it it sounds reasonable.

And how you do you go about when only wanting to answer the question: is the chromatin state in two sample different (mostly open vs mostly closed)? In this case which regions are differentially open/closed is not of interest.

ADD REPLY • link 5.3 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

I think these are valid concerns, maybe the discussion in this thread is useful TPM for ChIP-seq normalization

ADD REPLY • link 6.8 years ago by dariober 14k