I have a ChIP-Seq experiment, with two knock-outs and one WT from three different time-points. For each of them I have IP and input samples in duplicates. In addition I have IgG samples (also in duplicates) for each of the conditions (only for time-point 0).
My problem is that the library sizes varies a lot between the samples. I was wondering if and how it can/would influence the normalization of the samples . It would also be of interest what kind of implications this would have on the results. Is there anything one can do about it?
here are some examples of the differences -
KO1.input.4h.2 5465053 wt.ip.4h.2 5397507
KO1.input.4h.1 10867709 wt.ip.4h.1 33976157
KO1.input.2h.2 4820147 wt.ip.2h.2 6058079
KO1.input.2h.1 11566194 wt.ip.2h.1 9670735
KO1.input.0h.2 4943518 wt.ip.0h.2 6144126
KO1.input.0h.1 28201011 wt.ip.0h.1 11790750
Above are two examples from the dataset. on the right is one of the input batches from KO1. the read numbers for time-point 0 are 4943518 and 28201011.
On the left side are the IP samples for the WT time-points. For the 4h TP the read counts are 5397507 and 33976157.
In both cases one of the duplicates has more then 5x more reads than the other. Is this factor a problem for the normalisation? Would I introduce a too strong bias with these numbers?
I would appreciate any advice or countermeasures.
thanks , Assa
Thanks, I know the normalisation is different and dependent from one method to the other. My worries are mainly due to the huge differences between the library sizes. Independent of the algorithm, is it possible to handle such a big difference between the libraries?