Hey everybody,
I'm currently analysing ChIPseq data coming from paraffin-embedded samples. The reads we get from sequencing are a lot more dispersed throughout the genome compared to fresh tissue (using the same antibody). What I mean with dispersed is:
1) We get reads in regions in which there is no signal in the fresh tissue.
2) In the places we do see enrichment in the paraffin-embedded samples, the peak reaches a maximum overlap of ~20 reads, where the fresh tissue gives us ~150 reads.
I would like to assign a number to this. Obviously it depends on what you're measuring (histone marks, TF, etc.), but as a comparison between fresh and paraffin-embedded samples this could be a useful number to improve the ChIPseq protocol.
Does anybody know of something like this? As a first idea I imagine just a distribution of coverage. This would show you if there are regions with high coverage or not. But maybe there is a more sophisticated solution?
Thanks!
I have used FRiP to estimate this problem before but I would like to avoid the step of peak calling because I feel like it should be able to do it without prior filtering of the signal. plotFingerprint is a great idea, thank you! I'll also have a look at the Jensen Shannon distance but I'm not very familiar with it yet.