Question

Input control for ChIP-seq. [Seeking for the clarification between "Library DNA conc.", "no. of raw reads", and "normalization"]

1

Entering edit mode

6.3 years ago

chiefcat ▴ 180

Hi all,

Although the Input control is commonly used for in ChIP-seq. analysis (e.g. for normalization, as background for peak calling), it seems hard to find an easily understandable description explaining how it can be properly prepared (Before sequencing) and used in later data analysis stage. When calculating ChIP enrichment of a region of interest in ChIP-qPCR experiment, the thing you need is just the relative amount of cell no. or DNA used for Input control and the IP experiment. When doing ChIP-seq., the steps are much more complex and I don't really understand what to do with the Input control.

I want to examine the average signal strength of the region(s) of interest against the background using bam files (Signal in Input control). The main question is that "Which steps in the data analysis process can take care of the differences of sequencing raw reads output between Input control and the IP sample, so that one can tell if the signal at the particular regions is higher or lower than the background?"

I ask that because I've supposed that signal strength can only be compared when the total amount of reads in the Input & IP sample are the same or had been scaled to the same. Is the normalization (e.g. RPGC: reads per genomic content; RPKM: reads per kilobase per million reads) or other steps taking care of this? Other than plotting the signal profile, do I need to do some steps to address the read no. differences before peak calling?

Another thing is that, is it normal to get different no. of sequencing Raw Reads (e.g. I've got 50% for the largest different among my samples) even same amount of DNA Seq. libraries had been subjected to sequencing?

Thanks very much!!

Kylie (Beginner of NGS)

ChIP-Seq normalization sample preperation • 4.8k views

ADD COMMENT • link updated 6.3 years ago by Carlo Yague 8.7k • written 6.3 years ago by chiefcat ▴ 180

score 0 · Answer 1 · 2017-12-21

how [input control] can be properly prepared (before sequencing) ?

This is extremely simple: prepare the chromatin for your IPs, and just before the IP step, save a small fraction of chromatin. This is your input. Then treat the IP and input samples together for the decrosslink, RNAse/proteinase treatment and library preparation. For the library prep, you will usually start with the same amount of DNA from the IP and input. As you said, that makes you loose the proportionality between IP and input relative to the number of cells, so you can not do the classical IP/input ratio that is done with ChIP-qPCR.

Which steps in the data analysis process can take care of the differences of sequencing raw reads output between Input control and the IP sample, so that one can tell if the signal at the particular regions is higher or lower than the background?"

You can NOT simply divide or substract the IP signal by the input signal in this case
One of the most used peak caller (also quite old) that takes input into account is MACS. Note that MACS (and most peak callers) also take care of differences in library size.

Is it normal to get different no. of sequencing Raw Reads even when the same amount of DNA Seq. libraries had been subjected to sequencing?

Its not rare to have such differences. This could be due to quantification/pipeting errors. Also big DNA fragments in your samples can mess up DNA quantification because they will be measured (with the QUBIT or bioanalyzer), but are not going do be sequenced efficiently.