Hi,
I received RNA-SEQ data for analyzing without being involved in the experimental design before the sequencing was conducted. I'm using GLM in edgeR for comparing the conditions, but now for this particular setup seems like adapt the models presented in the edgeR guide is not so trivial (at least for my limited statistical knowledge).
They have a cell line which can be transformed (outcome) with a single point mutation (Mutation). They tested by knocking out(KO) two different genes if they can avoid this transformation. The results showed that both knockouts could avoid the transformation. However from the preliminary data analysis I can tell that the effect on the expression profiles for the mutation is largely bigger than for both of the knockouts.
The data available can be represented like this:
sample | Mutation | KO | Outcome |
1 | wt | wt | Non_Transformed |
2 | wt | wt | Non_Transformed |
3 | wt | wt | Non_Transformed |
4 | G12V | wt | Transformed |
5 | G12V | wt | Transformed |
6 | G12V | wt | Transformed |
7 | G12V | genX | Non_Transformed |
8 | G12V | genX | Non_Transformed |
9 | G12V | genX | Non_Transformed |
10 | G12V | genY | Non_Transformed |
11 | G12V | genY | Non_Transformed |
12 | G12V | genY | Non_Transformed |
Being said that the effect of the mutation is overwhelming in comparison with the knockouts, the expression profiles of the knockouts are extremely close to the control (wt for Mutation + wt for knockouts). The idea here is to understand this slight difference that can avoid the transformed outcome
For trying to modelate this I coded
>Mutation <- as.factor(c("NoMut","NoMut","NoMut","G12V","G12V","G12V","G12V","G12V","G12V","G12V","G12V","G12V"))
>KO <- as.factor(c("ctrl","ctrl","ctrl","ctrl","ctrl","ctrl","genx","genx","genx","geny","geny","geny"))
>design <- model.matrix(~KO+Mutation, dgeCountsClean$samples)
wich renders the design
>design
(Intercept) KOgenyKO KOgenxKO MutationG12V
wt_1 1 0 0 0
wt_2 1 0 0 0
wt_3 1 0 0 0
G12V_1 1 0 0 1
G12V_2 1 0 0 1
genxG12V_1 1 0 1 1
genxG12V_2 1 0 1 1
genxG12V_3 1 0 1 1
genyG12V_1 1 1 0 1
genyG12V_2 1 1 0 1
genyG12V_3 1 1 0 1
I checked the the KOgenyKO and KOgenxKO, for deferentially expressed genes but unfortunately this design for these samples doesn't seems to be sensitive enough for accounting differences in the expression profiles. I thought that maybe modelling the GLM as ~KO*Mutation and checking for the coefficients in the interaction between the conditions (KOgenyKO:MutationG12V and KOgenxKO:MutationG12V) could help, but the problem with this particular design is that because I don't have the conditions of the KO without the mutation, the conditions KOgenxKO:MutationG12V and KOgenxKO (and the same for the geny) are redundant and the matrix is not of full rank.
So if anyone could give some piece of advice, a tutorial to read, or any tip for help me get out of this conundrum I will be extremely grateful.
Cheers!
Gero.