Question

Chip-seq analysis with input and spike-in

11

Entering edit mode

7.7 years ago

Damian Kao 16k

I have chip-seq data on histone modifications. I've been scouring literature and blogs on Chip-seq analysis involving normalizing to input and normalizing across samples using spiked-in samples.

There doesn't seem to be a cohesive differential binding analysis approach that can incorporate input normalization along with spike-in normalization.

It seems most of the diff. binding approaches involves using RNA-seq methods (EdgeR, DESeq2) on read counts over genomic windows. I can substitute normalization factors used in these RNA-seq packages with spike-in normalization factors, but how do I account for input? Is blacklisting sites that are not different from input really the best way? Transforming the counts over input via log2fc or subtraction is not statistically sound (other bioinformaticians seems to agree).

I've looked at the input signal for my data and have found signal patterns in areas consistent with some of my histone markers. This makes me think that I should really normalize my IP to input before performing differential binding analysis.

Presence of binding bias in input samples also seems to be supported by this paper (http://www.pnas.org/content/106/35/14926.long) where they found crosslinked, sonicated chip-seq samples (no IP) having signals that correspond to open chromatin.

Maybe input normalization isn't even necessary if we make the assumption that input is consistent across my different histone modification IPs? However, wouldn't that decrease the statistical power of the differential binding analysis?

This is my first time analyzing chip-seq data. Any thoughts on this from experts would be appreciated.

chip-seq • 11k views

ADD COMMENT • link updated 6.2 years ago by nicolas.descostes ▴ 160 • written 7.7 years ago by Damian Kao 16k

0

Entering edit mode

Without being an expert, I have been told to not use input for normalization accross samples and that its usage is best limited to peak calling within conditions and visualisation (to ensure that peaks in the IP are not present in the input).

ADD REPLY • link 7.7 years ago by Carlo Yague 8.6k

score 4 · Answer 1 · 2018-02-21

4

Entering edit mode

6.2 years ago

nicolas.descostes ▴ 160

We have started to develop a package for this: ChIPSeqSpike (https://bioconductor.org/packages/devel/bioc/html/ChIPSeqSpike.html)

ADD COMMENT • link 6.2 years ago by nicolas.descostes ▴ 160

0

Entering edit mode

hi,nicolas.descostes: Now，I have a problem is that: I want to use MACS to call Peak from the result of ChIPSeqSpike. I do not know how can I design the downstream analysis. Your help would be appreciated.

ADD REPLY • link 6.1 years ago by huangzy6281 • 0

1

Entering edit mode

Hi,

One solution would be to use the BamCoverage function of deeptools to obtain bedgraphs, then convert them to bed and then use macs2. You can use the scaling factors given by the spikesummary function in bamcoverage.

ADD REPLY • link 6.1 years ago by nicolas.descostes ▴ 160

0

Entering edit mode

thank a lot for your answer. I also have some porblems: 1. Is the "test_coord.gff" is come from the Ensemble? If not, how do I get the gff file? 2. I put the genome file such as hg19.fa in the extdata directory of ChIPSeqSpike package. After test the Example in ChIPSeqSpike, I get a error that is "Error in getPlotSetArray(tracks = files, features = gff_vec, refgenome = genome_version, : No genomes installed!". How can I solve these problems? Your help would be appreciated.

ADD REPLY • link 6.1 years ago by huangzy6281 • 0

0

Entering edit mode

can you contact me by gmail? Please send again your message, it will be easier.

ADD REPLY • link 6.1 years ago by nicolas.descostes ▴ 160

0

Entering edit mode

thank you very much. I have sent the questions to your email.

ADD REPLY • link 6.1 years ago by huangzy6281 • 0

0

Entering edit mode

Hi, Nicolas, Can you specify which scaling factor (i.e., endo vs. exoScalFact, or the ratio of Exo percentage?) should be used in the BamCoverage?

Thanks a lot! Xiaoyong Fu

ADD REPLY • link 5.7 years ago by xiaoyonf ▴ 60

0

Entering edit mode

Hi Nicolas, I am starting to use the ChIPSeqSpike in R, but stuck in the Error: The info file should be in csv or txt format. This error came out in the quick start using spikePipe command. I appreciate your help to solve my problem.

Thanks, Xiaoyong Fu Baylor College of Medicine

ADD REPLY • link 5.7 years ago by xiaoyonf ▴ 60

score 2 · Answer 2 · 2016-07-27

I'm not sure there is a good method for incorporating both normalizations, b/c they serve different functions. The spike-in is designed for global assessment of differences, while input is targeted to local differences. Spike-ins would allow you to detect an overall increase in (for example) H3K9me3 where the distribution of the mark is unchanged, whereas normalization to input by read depth would not. However, the increased read depth resulting from spike-in normalization would also be expected to produce broader peaks plus (more problematically) some number of new peaks that now exceed the statistical threshold. And, as you noted, bias exists in the input sample, so excluding that control will produce false-positive peaks in the experimental sample.

Our studies have largely involved changes in the distribution of marks, so we've always used input controls for peak calling. Perhaps users of spike-in controls will weigh in on their experiences.

score 2 · Answer 3 · 2016-07-27

I agree with Harold that spike-ins and inputs serve different purposes, and I don't know of any definitive answers on this. But here's some interesting reading from the authors of DESeq2, csaw, and diffBind that might give you some ideas:

The argument is that normalizing to input for the purposes of differential binding has its own set of problems that may be worse than just assuming that the input doesn't change across treatments.

Maybe you could compare the effects of normalizing for trended biases vs composition biases to see if the magnitude of the effects correspond to spike-in norm factors? In any case, it seems like csaw would be the best framework for playing around with spike-ins for normalization (based on the quality of its documentation and the sophistication of its tools).

score 0 · Answer 4 · 2017-11-10

0

Entering edit mode

6.4 years ago

valentina.boeva ▴ 40

You can try HMCan-diff, which now accepts spike-in information. HMCan-diff also removes the CG-content bias and copy number bias. The latter can it important in case if your two conditions are normal and cancer cells. Link to the HMCan paper in Nucleic Acids Research

ADD COMMENT • link 6.4 years ago by valentina.boeva ▴ 40