Spike-in normalization with Cut&Tag data
1
0
Entering edit mode
3.7 years ago

Hello, I am working on Cut&Tag data that includes spike-in for normalization. The spike-in is the Ampr gene from the plasmid pBluescript(+). I have 4 samples, Wt[1,2], and Ko[1,2], all of which were generated using same antibody. What I have done so far for each sample is:

  1. Aligned my reads to mm10 (Bowtie2)

  2. Aligned my reads to Ampr gene from pBluescript (Bowtie2).

I want to normalize these 4 samples using scaling factors calculated using the Spike-in data. I was wondering how to go about doing that? From my research, I found that normalization factors are calculated using the following:

normalization factor = lowest_sample (spike-in) /sample_of_interest (mm10) (https://www.biostars.org/p/247172/),

where the lowest sample is the sample with the lowest Spike-in counts and the sample_of_interest is the counts of each sample.

In this hypothetical example below, if each of these are counts from bowtie2 (PE uniquely aligned), then would the scaling method A make sense? or should I use method B or neither?

        mm10  Spike-in       Scaling Factor *A*    OR     Scaling Factor *B*
Wt1     70       5                5/5 = 1                          70/5
Wt2     80       7                5/7 = 0.7                        80/7
Ko1     30       6                5/6 = 0.8                        30/6
Ko2     40       6                5/6 = 0.8                        40/6

I would greatly appreciate advice on whether my current idea for normalization is correct or not. If not, could you point me in the right direction?

Is there a way to use deeptools to do this?

Any help will be greatly appreciated.

deeptools normalization ChIP-Seq cutandtag • 3.9k views
ADD COMMENT
0
Entering edit mode

This is more a comment than a question, but I never really got why use of spike-ins in routine experiments would be meaningful. You add a constant amount of spike to each library, but if signal-to-noise ratio is different between libraries (in ChIP/CUT applications very common) this essentially is a normalization per library size and therefore unreliable. I would just call peaks on samples, make a count matrix on merged peaks and then use DESeq2 or edgeR to get proper size factors. ATAC-seq sample normalization

But maybe I simply do not understand the idea of spike-ins.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6