Question

Chip-Seq normalization between conditions

0

Entering edit mode

3.9 years ago

srhic ▴ 60

Hello,

I have some chip-seq data for three different conditions for which I have plotted RPKM normalized counts around features of interest using deeptools. The plot clearly shows differences in chip-signal between the conditions but I am concerned about different levels of backgrounds between conditions. The blue sample in the plot seems to have lower signal than the other two samples no matter which regions I plot.

Any ideas on how I can normalize the samples so the have the same basal signal? I assume some sort of z-score normalisation may work but am not sure how to do it with my bigwig files.

Thanks

enter image description here

deeptools ChIP-Seq • 1.4k views

ADD COMMENT • link 3.9 years ago by srhic ▴ 60

0

Entering edit mode

What are these samples? I personally like to explore normalization efficiency with MA-plots. A properly-normalized sample should have the majority of data points centered somewhat at y = 0, or at least there should be a somewhat symmetric distribution of the data points around y = 0 depending on how dramatic the changes are between samples. Given you have a count table of normalized counts (not log2 transformed), use for each pairwise comparison:

FoldChange = log2(sample1 / sample2)
AverageCounts = 0.5*log2(sample1 * sampe2)

smoothScatter(AverageCounts, FoldChange)

Without knowing details I canalready predict that naive per-million scaling messes up things and you need a more elaborate normalization strategy, but lets see how the plots look. Is one of these samples an input sample?

By the way you have to paste the full link of the image into the image field. In the above image that would be https://i.ibb.co/6Zyhb1b/chip.png so including the suffix.

ADD REPLY • link 3.9 years ago by ATpoint 81k

0

Entering edit mode

Thanks, the samples are histone marks under three different treatment conditions. I just have the bigwig files output by deeptools. I will try to import them in R and make a count table. Will try and get back.

ADD REPLY • link 3.9 years ago by srhic ▴ 60

0

Entering edit mode

Try to make a count matrix based on the merged peaks directly from the BAM files, e.g. using featureCounts. Also see for normalization: A: ATAC-seq sample normalization (quantil normalization) It applies for ChIP-seq as well.

ADD REPLY • link 3.9 years ago by ATpoint 81k

0

Entering edit mode

I am trying out quantile normalization the way you described it for atac-seq. I was also able to get some good results using HOMER. I divided the genome into windows with bedtools and then extracted counts for those windows using HOMER which has an option that allows the counts to be normalized using rlog function of Deseq2. I am not sure I completely understand the rlog normalization or if it is the correct method to use but it made the profiles look much more similar. Will also see how the edgeR approach you described works. Thanks!

ADD REPLY • link 3.9 years ago by srhic ▴ 60