Insert size of ~20bp with ATAC-seq?
1
0
Entering edit mode
5.1 years ago
a.rex ▴ 350

I have recently downloaded some publicly available ATAC-seq data. I aligned with BWA to reference genome, removed duplicate (in this instance 70% of library is duplicates), and then used picardtools to generate a fragment size distribution. However, I see a large peak at around 20bp? The library was sequenced with 75bp forward and 75bp reverse PEs. Does a 20bp insert length mean that the insert is just short? How can I check this? Presumably the reads have a lot of adapter sequence?

enter image description here

alignment sequencing atac-seq • 4.0k views
ADD COMMENT
1
Entering edit mode

can you post the plot of "distribution of insert size" ? Its common to observe a sharp peak less than 100bp but you should also see a peak of 150-200bp and then around 300bp.

ADD REPLY
0
Entering edit mode

I have uploaded said image now

ADD REPLY
1
Entering edit mode

Odd plot, never seen anything like that in ATAC-seq data, and I think I've seen quite many of them. Which dataset is that, then I quickly run it through my pipeline to see if it is indeed an odd library or a technical thing to debug. Did you filter chrM before collecting insert sizes?

ADD REPLY
0
Entering edit mode

It is very odd - it is for a obscure species and published a few days ago.

ADD REPLY
0
Entering edit mode

I did not filter chrM as we do not have this information.

ADD REPLY
1
Entering edit mode

Ok I see. It could be that the sharp peak is some heavily-digested non-nuclear DNA like chrM (or any other organelle DNA or parasite DNA that might be in the worm. Here is how the insert sizes look for only chrM in mouse:

enter image description here

You also see that it accumulates at short fragment sizes as this nucleosome-free is an attractive target of the transposome. Maybe you can make a kind of pseudo-chrM by taking the mitochondrial genome of a closely related well-annotated species and include it into the reference to get rid of some of these contaminations. Or maybe take all the reads below 50bp insert size and try to assemble them to followed by sequence comparison to chrM or other organelle DNA to get an idea what it is.

ADD REPLY
0
Entering edit mode

I realise now that perhaps a peak at 20bp (insert size) corresponds to a fragment of -95bp?

ADD REPLY
1
Entering edit mode
5.1 years ago
igor 13k

I think Picard's definition refers to the actual fragment size, including the reads. Check the illustration in this previous discussion: Is PICARD CollectInsertSizeMetrics use soft-clipping information to compute the insert size ?

It's not uncommon to end up with very short fragments in ATAC-seq. Some people discard them. For example, in the ATACseqQC paper, they removed read pairs with mapping template shorter than 38 bp.

ADD COMMENT

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6