Question

Spike-in normalization with Cut&Tag data

0

Entering edit mode

3.7 years ago

hina.bandukwala • 0

Hello, I am working on Cut&Tag data that includes spike-in for normalization. The spike-in is the Ampr gene from the plasmid pBluescript(+). I have 4 samples, Wt[1,2], and Ko[1,2], all of which were generated using same antibody. What I have done so far for each sample is:

Aligned my reads to mm10 (Bowtie2)
Aligned my reads to Ampr gene from pBluescript (Bowtie2).

I want to normalize these 4 samples using scaling factors calculated using the Spike-in data. I was wondering how to go about doing that? From my research, I found that normalization factors are calculated using the following:

normalization factor = lowest_sample (spike-in) /sample_of_interest (mm10) (https://www.biostars.org/p/247172/),

where the lowest sample is the sample with the lowest Spike-in counts and the sample_of_interest is the counts of each sample.

In this hypothetical example below, if each of these are counts from bowtie2 (PE uniquely aligned), then would the scaling method A make sense? or should I use method B or neither?

        mm10  Spike-in       Scaling Factor *A*    OR     Scaling Factor *B*
Wt1     70       5                5/5 = 1                          70/5
Wt2     80       7                5/7 = 0.7                        80/7
Ko1     30       6                5/6 = 0.8                        30/6
Ko2     40       6                5/6 = 0.8                        40/6

I would greatly appreciate advice on whether my current idea for normalization is correct or not. If not, could you point me in the right direction?

Is there a way to use deeptools to do this?

Any help will be greatly appreciated.

deeptools normalization ChIP-Seq cutandtag • 3.9k views

ADD COMMENT • link updated 2.8 years ago by ATpoint 81k • written 3.7 years ago by hina.bandukwala • 0

0

Entering edit mode

This is more a comment than a question, but I never really got why use of spike-ins in routine experiments would be meaningful. You add a constant amount of spike to each library, but if signal-to-noise ratio is different between libraries (in ChIP/CUT applications very common) this essentially is a normalization per library size and therefore unreliable. I would just call peaks on samples, make a count matrix on merged peaks and then use DESeq2 or edgeR to get proper size factors. ATAC-seq sample normalization

But maybe I simply do not understand the idea of spike-ins.

ADD REPLY • link 2.8 years ago by ATpoint 81k

score 0 · Answer 1 · 2021-07-09

0

Entering edit mode

2.8 years ago

bhanratt ▴ 50

See here: https://yezhengstat.github.io/CUTTag_tutorial/#V_Spike-in_calibration

ADD COMMENT • link 2.8 years ago by bhanratt ▴ 50