Hi all, I am trying to figure out the best way to make a model matrix for the following RNA-seq experiment using DESeq2. The experimental design is as follows:
- 28 samples from 14 patients with a certain disease.
- 2 timepoints baseline and follow up
- Two groups, one group of 9 patients has been given a treatment, the other group of 5 patients has been left without treatment.
So I would like to see what is the differnece between the treated group and the untreated group in the follow up timepoint vs their baseline.
In other words I am looking for the difference of differences between the treated and the untreated group, how much difference cannot be explained by the general differences between follow up and beseline (which are not of biological importance). So my experiment is paired only within groups and we know that there is much variance between the different patients.
My colData is as follow and the reference levels are baseline and untreated for each respective factor.
colData_paired_all
timepoint group patient
1 baseline treated CTS_110
2 baseline treated CTS_118
3 baseline treated CTS_116
4 baseline treated CTS_115
5 baseline treated CTS_114
6 baseline treated CTS_122
7 baseline treated CTS_124
8 baseline treated CTS_123
9 baseline treated CTS_126
10 baseline untreated CTS_103
11 baseline untreated CTS_117
12 baseline untreated CTS_39
14 baseline untreated CTS_33
15 follow_up treated CTS_110
16 follow_up treated CTS_118
17 follow_up treated CTS_116
18 follow_up treated CTS_115
19 follow_up treated CTS_114
20 follow_up treated CTS_122
21 follow_up treated CTS_124
22 follow_up treated CTS_123
23 follow_up treated CTS_126
24 follow_up untreated CTS_103
25 follow_up untreated CTS_117
26 follow_up untreated CTS_39
27 follow_up untreated CTS_83
28 follow_up untreated CTS_33
I can use the design "~ timepoint + timepoint:group" which gives a full rank matrix with coefficients
resultsNames(dds)
[1] "Intercept" "timepoint_follow_up_vs_baseline"
[3] "timepointbaseline.grouptreated" "timepointfollow_up.grouptreated"
So I can calculate, DE follow up vs baseline for group untreated:
res <- results(dds, name= "timepoint_follow_up_vs_baseline")
DE follow up vs baseline for group treated:
res_followup_treated <- results(dds, list(c("timepoint_follow_up_vs_baseline", "timepointfollow_up.grouptreated")))
only the differences for group treated which cannot be explained by the general follow up vs baseline differences (the interaction term)
res_treated_diff <- results(dds, name = "timepointfollow_up.grouptreated")
My problem is that if I include the patient variable in the design, like "~ patient + timepoint + timepoint:group" the matrix is not full rank. I can see that pairing is true only within groups, but I wonder is there any way to include the patient variable, in order to block patient's effect, in the formula? Do I have already included because groups are paired in the first design the gives the full rank matrix?
Thank you very much for your help!!!