For a gene expression project using limma, I have 2 studies with control samples processed the same way but at different times. However, the treatment samples are coming from only one study. I did not want just pool the controls, instead wanted to add the controls with a batch variable. Is below model.matrix suitable for this purpose? I know that if treatments were available for both studies, it would be a standard model with a batch variable but I am confused for this case.
treat study
1 control_4hr study2
2 control_4hr study2
3 control_4hr study2
4 treatment1_4hr study2
5 treatment1_4hr study2
6 treatment1_4hr study2
7 control_4hr study1
8 control_4hr study1
9 control_4hr study1
model1<-model.matrix(~comparison1$treat+comparison1$study)
(Intercept) comparison1$treattreatment1 comparison1$studystudy2
1 1 0 1
2 1 0 1
3 1 0 1
4 1 1 1
5 1 1 1
6 1 1 1
7 1 0 0
8 1 0 0
9 1 0 0
Any insight would be appreciated.
Thank you for confirming. Could you suggest a reading that explains what the additional variables in the design are doing and when the model is not full-rank?