Question

ATAC-seq size selection and TF prediction on paired end data

2

Entering edit mode

6.4 years ago

bwassie ▴ 20

Hi all

I have a few questions about ATAC-seq data analysis. My lab is using ATAC-seq to identify accessible regions in the chromatin and check for differential chromatin accessibility between disease and control state as well as checking for TF binding in open chromatin regions (we usually do motif analysis for this). We currently do not size select our data and we do paired end sequencing.

In order to do motif analysis, should we remove fragments that correspond to nucleosomal reads? Since TFs usually bind in nucleosome free regions, it doesn't make sense to me that we keep larger, nucleosomal fragments. However, I have seen many papers that do not do any sort of size selection (experimental or computational) and I am wondering if I am missing something.

Second, is it necessary to do paired end sequencing for ATAC-seq if we do size selection during library prep? I have also noticed that almost everyone does paired end sequencing for ATAC but I'm not sure why this is the case?

ATAC-seq motif-prediction size-selection • 5.0k views

ADD COMMENT • link updated 6.4 years ago by Devon Ryan 104k • written 6.4 years ago by bwassie ▴ 20

0

Entering edit mode

Just because an area is not defined as a nucleosomal free region doesnt mean it isnt one. There maybe TFs binding there that make it look like a nucleosome occupied region....so you would lose it in your motif analyses

ADD REPLY • link 6.4 years ago by BioinfGuru ★ 1.7k

0

Entering edit mode

That's a fair point kenneth. Do you notice that in your data?

ADD REPLY • link 6.4 years ago by bwassie ▴ 20

0

Entering edit mode

Well Im just going off what I remember from reading in the NucleoAtac github issues pages. Somewhere in there there is a warning that just because a region is not called as an "NFR" does not mean it is not one. It just means there wasnt the evidence required (length, flanking nucleosomes etc).

To be honest, I'm actually going to take a look into Devon's answer below in his suggestion for footprinting.

ADD REPLY • link 6.4 years ago by BioinfGuru ★ 1.7k

score 2 · Answer 1 · 2017-12-04

2

Entering edit mode

6.4 years ago

Devon Ryan 104k

Yes, at least we filter out everything that's nucleosomal in size (in our snakemake pipeline we use a value of 150 for this). Strictly speaking I suppose you don't need to do this, but given how footprinting works it helps shrink the search space.
How exact is your size selection? While you could theoretically get away with SE sequencing, it sure makes the analysis a lot easier. Further, you're just opening yourself up to reviewer criticism if you got with SE rather than PE ("your results may just be an artifact of having not properly excluded nucleosomes" or "your results are due to a bias of having too-short fragments" ...). We also use PE reads for our ATACseq datasets.

ADD COMMENT • link 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon,

We do gel based size selection; we just use a razor and cut the nucleosome free band from the gel. We've been doing this for a while and we're thinking of switching to paired end. I agree about the analysis being easier with paired end!

ADD REPLY • link 6.4 years ago by bwassie ▴ 20

0

Entering edit mode

I mostly asked about how you were doing size selection because one of the first steps in footprinting is to input open regions, which are basically non-nucleosomal-sized peaks. That's easy to do and exact if one filters by fragment size, but I imagine it wouldn't be terribly exact if one is just cutting out a gel block and doing a DNA extraction from it.

ADD REPLY • link 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

Devon - I would like to perform footprinting but I read somewhere that you need massive ATAC sequencing depth to do this (over 150-200M per sample). Currently our average depth is around the 50M read mark. What would you recommend?

ADD REPLY • link 6.4 years ago by BioinfGuru ★ 1.7k

1

Entering edit mode

150-200M seems a bit over the top. I think we've had success with 100M using Wellington footprinting, but I'll double check with the most recent person to have done this once she gets in today.

ADD REPLY • link 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

She just wrote that she ended up with 60M pairs after filtering, so I'd guess at least 100M, maybe more like 150M initial to be sure. Note that this is for mouse/human sized genomes, so scale that appropriately for whatever you're working with.

ADD REPLY • link 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

Yes this is for the mouse genome. After filtering the read counts range from 30-60M. I'll certainly give Wellington a go. Thanks you for that :)

ADD REPLY • link 6.4 years ago by BioinfGuru ★ 1.7k