Hello, I was wondering how the DiffBind normalization works when you have different conditions, and what's the proper way to organize the analysis according to that, since I noticed certain differences (I'll illustrate with an example).
Imagine I have the following groups of replicates (ChIP-seq data, humans):
- Studying ChIP-seq of certain factor in a human disease -
Samples from Cell type A (with e.g. 9 replicates for each condition A1, A2... AH).
- A1 - stage1 of disease
- A2 - stage2 of disease
- A3 - stage3 of disease
- AH - healthy indiviauals (as control condition)
Samples from Cell type B (as well, each of them with replicates)
- B1 - stage1 of disease
- B2 - stage2 of disease
- B3 - stage3 of disease
- BH - healthy individuals
Not only we have disease-control but also we have different cell types (so one more level of comparison).
There are two ways to do this analysis:
- I can run my diffbind script two times (one for samples A and another one for samples B), or
- I can concatenate the samples and make only one run with all together (the contrast can be done anyways pairwise between A1 and AH, A2, AH... etc).
I was wondering if one way is better than the other because, since it's data from the same disease, I would like to compare between cell types A and B. For this, they have to be normalized properly. Even though I select the same normalization in both runs (the default in this case, DBA_SCORE_TMM_MINUS_FULL) maybe when I run all together the normalization takes into account things that the separate runs don't. I did both ways for an example of mine and I noticed a slight difference in one of the stages, hence I wondered which is the difference and which one is more correct.
Does someone have a deeper understanding of the inner workings of this package and can answer this? Thank you !