I'm not sure the GLM in DESeq2 can/should do this, but I've seen posts of others doing so with less complex/robust experimental designs.
My problem is that I can't get a design to work that incorporates a patient identifier, to include between/within-patient variance. I see that vignette('DESeq2'), offers a solution for crossover design (the same patient gets each treatment): to repeat the individual label within each group. But this is a randomized placebo-controlled clinical trial without crossover.
We have 2 samples from each patient, 2 treatments (1 per patient), 2 timepoints (before,after). Total 4 samples per patient. I am blinded as to which treatment group is placebo vs active.
> samples
group time pairs ind.Repeat ind.n
grA.1002.lesion1.before 2grA 1Before a 1 1
grA.1002.lesion2.before 2grA 1Before b 1 1
grA.1002.lesion1.after 2grA 2After a 1 1
grA.1002.lesion2.after 2grA 2After b 1 1
grB.1016.lesion1.before 1grB 1Before a 2 2
grB.1016.lesion2.before 1grB 1Before b 2 2
grB.1016.lesion1.after 1grB 2After a 2 2
grB.1016.lesion2.after 1grB 2After b 2 2
grA.2002.lesion1.before 2grA 1Before a 3 3
grA.2002.lesion2.before 2grA 1Before b 3 3
grA.2002.lesion1.after 2grA 2After a 3 3
grA.2002.lesion2.after 2grA 2After b 3 3
grB.2011.lesion1.before 1grB 1Before a 1 4
grB.2011.lesion2.before 1grB 1Before b 1 4
grB.2011.lesion1.after 1grB 2After a 1 4
grB.2011.lesion2.after 1grB 2After b 1 4
grA.3052.lesion1.before 2grA 1Before a 2 5
grA.3052.lesion2.before 2grA 1Before b 2 5
grA.3052.lesion1.after 2grA 2After a 2 5
grA.3052.lesion2.after 2grA 2After b 2 5
grB.3058.lesion1.before 1grB 1Before a 3 6
grB.3058.lesion2.before 1grB 1Before b 3 6
grB.3058.lesion1.after 1grB 2After a 3 6
grB.3058.lesion2.after 1grB 2After b 3 6
Certainly the most important comparisons in the model are: design=formula(~time*group) And this works.
However, I need help including the pairedness.
~ind.n + time*group does not work:
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
Again, when making the ind.Repeat column redundant across the groups, and inputting design=formula(~ind.Repeat + time*group), it gives results, but comparisons are spuriously made with patient1 as the reference, without comparing patient2 to patient3, and there is a spurious combination across the treatment groups:
~ind.Repeat + time*group
resultsNames(dds)
[1] "Intercept" "ind.Repeat_2_vs_1" "ind.Repeat_3_vs_1"
[4] "time_2After_vs_1Before" "group_2grA_vs_1grB" "time2After.group2grA"
Of course the software is doing what is told, I'm not blaming the software, just trying to ask how to properly represent the experimental model.
I have tried everything I can think of (input as design=formula(*) into DESeqDataSetFromMatrix, prior to running DESeq). They all either fail, or seem to misrepresent the situation.
The best I have come up with that works:
~pairs:group + group*time
resultsNames(dds)
[1] "Intercept" "group_2grA_vs_1grB" "time_2After_vs_1Before"
[4] "group1grB.pairsb" "group2grA.pairsb" "group2grA.time2After"
Similarly:
~pairs:group:time + group*time
resultsNames(dds)
[1] "Intercept" "group_2grA_vs_1grB"
[3] "time_2After_vs_1Before" "group2grA.time2After"
[5] "group1grB.time1Before.pairsb" "group2grA.time1Before.pairsb"
[7] "group1grB.time2After.pairsb" "group2grA.time2After.pairsb"
Again, this runs, and I believe makes sense. I am challenged by wrapping my head around a lack of "pairsA" being directly represented - seems okay as this would be the reference to analyze "pairsB." I was hoping to have "time1Before" be the reference in this 3-way combination, but haven't figured out how to do that.
And, of course the "group2Treatment.time2After" should come last, but again haven't figured out how to make it so. Yet I can just insert this into contrast:
results(dds,contrast=list(c("group2grA.time2After")))
or
results(dds,contrast=list(c("group1grB.time2After")))
This seems like it might be okay... what do y'all think? Do I need betaPrior=FALSE?
One final idea is to combine the group variables, then set the contrasts manually:
lesion_samples$SGroup <- factor(paste0(lesion_samples$group,lesion_samples$time,lesion_samples$ind.n))
lesion_Data <- DESeqDataSetFromMatrix(countData = lesion_CountTable, colData=lesion_samples, design=formula(~SGroup))
dds <- DESeq(lesion_Data)
> resultsNames(lesion_DESeq)
[1] "Intercept" "SGroup1grB1Before2" "SGroup1grB1Before4"
[4] "SGroup1grB1Before6" "SGroup1grB2After2" "SGroup1grB2After4"
[7] "SGroup1grB2After6" "SGroup2grA1Before1" "SGroup2grA1Before3"
[10] "SGroup2grA1Before5" "SGroup2grA2After1" "SGroup2grA2After3"
[13] "SGroup2grA2After5"
results(lesion_DESeq,contrast=list(c("SGroup2grA2After1","SGroup2grA2After3","SGroup2grA2After5"),c("SGroup2grA1Before1","SGroup2grA1Before3","SGroup2grA1Before5")))
(or)
results(lesion_DESeq,contrast=list(c("SGroup1grB2After2","SGroup1grB2After4","SGroup1grB2After6"),c("SGroup1grB1Before2","SGroup1grB1Before4","SGroup1grB1Before6")))
Help!
Further details: - this happens to be taxonomic calls (~microbiome data).
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.14.1 SummarizedExperiment_1.4.0
[3] Biobase_2.34.0 GenomicRanges_1.26.2
[5] GenomeInfoDb_1.10.2 IRanges_2.8.1
[7] S4Vectors_0.12.1 BiocGenerics_0.20.0