How many biological replicates needed for ChIP seq experiments?
1
0
Entering edit mode
6.6 years ago

Encode guideline suggests two biological replicates are enough for ChIP-seq experiments.

Initial RNA polymerase II ChIP-seq experiments showed that more than two replicates did not significantly improve site discovery (Rozowsky et al. 2009)

Paper Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431496/

But do we really need more than two biological replicates? I was looking for real life examples. Help will be much appreciated!

ChIP ChIP-Seq RNA-Seq • 7.4k views
ADD COMMENT
3
Entering edit mode
6.6 years ago
Rory Stark ★ 2.0k

It depends on what questions you are trying to answer with your ChIP-seq experiments.

The two-replicate guideline applies only to identification of binding sites ("site discovery"). This is a binary determination (binding-site vs. not-a-binding-site), potentially with an associated confidence statistic. However two replicates are generally inadequate for performing a quantitative analysis, ie. to determine differential binding.

It is frequently the case that differentially bound sites exhibit some binding in all sample groups, but that the rate of binding (related to binding affinity) changes significantly between conditions. If you are only able to identify if a site has binding or not, you will miss these sites. So, for example, if one sample group has a site where a specific TF binds in about 10% of the cells, and the second group has binding at that site in 90% of the cells, there is a 9-fold difference in the binding rate (differential binding), but it is likely that the sites would be identified in both sample groups.

As ChIP-seq results have inherent variance (both biological and technical), the experiment must be sufficiently powered to confidently detect differential binding, which in turn requires replicates. In general, ChIP-seq replicates have greater variance than RNA-seq replicates, so at least as many replicates are required for ChIP-seq as for RNA-seq. People wouldn't accept RNA-seq with only two replicates, and they should not accept ChIP-seq with only two replicates if any type of quantitative analysis that relies on capturing variance is being done.

Some references:

Stark, R. and Hadfield, J., 2016. Characterization of DNA-Protein Interactions: Design and Analysis of ChIP-Seq Experiments. In Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing (pp. 223-260). Springer International Publishing. [I can send yo a copy of this book chapter if you email me].

Ross-Innes, C. S., Stark, R., Teschendorff, A. E., Holmes, K. A., Ali, H. R., Dunning, M. J., ... & Carroll, J. S. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature, 481(7381), 389-393

Lun, ATL, and Smyth, GK. "De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly." Nucleic Acids Research (2014): gku351.

Mohammed, H., Russell, I. A., Stark, R., Rueda, O. M., Hickey, T. E., Tarulli, G. A., ... & Carroll, J. S. (2015). Progesterone receptor modulates ERa action in breast cancer. Nature, 523(7560), pp.313-317.

ADD COMMENT
0
Entering edit mode

Hi Rory, What a nice surprise! I am very thankful for your comment. Now I got the point clearly. As in my experimental setup, due to a certain stimulation, only a very few cells will have epigenetic changes, and that's why, to capture the phenomenon, I need to pull tissues from at least 5 animals from 4 experimental groups. Then, to have 2 biological replicates in each condition, I need to have tissues from 40 animals. That's why, I was wondering, if two replicates/condition would be fine or not. Later, I want to do diffbind analysis with my samples. Then it seems, I need to increase the sample size even more!

ADD REPLY
0
Entering edit mode

On another note, I found this rationale also (Paper link: http://www.nature.com/neuro/journal/v19/n1/full/nn.4194.html#methods)

"For cell type-specific ChIP- and MeDIP-seq experiments (sections 1.3 to 1.5), tissue from 20 mice for each of the two biological replicates was pooled. We chose to pool 20 mice per biological replicate since chromatin modification changes were expected to be small, coming from a small population of memory-forming cells. Consequently, the statistical detection of chromatin modification changes required low variance data, which can be obtained by pooling many biological replicates."

The authors also gave this equation:

the variance of the estimator for the true distribution mean θ is given by

"where np is the total number of pools, represents the biological variation, signifies the technical variation, rs denotes the number of individual samples that contribute to a pool, and ra is the number of sequenced samples for each pool (biological replicates) (citation in the main article). Given that rs > 1 in a pooled design the concomitant decrease in variance should lead to an increase in power to identify differentially expressed or modified regions."

What is your comment on that?

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6