I'm doing biological gene expression data. For the genes I analysed, each gene can produce two isoforms, one dominant (Gene isoform 1) one minus can be treated as a by-product (Gene isoform 2).
I'm actually interested in is the expression change of the by-product following time series (Gene isoform 2). However, the problem is, the expression of "Gene isoform 1" is a confounder here. i.e. I want the variance of "isoform 2" expression is driven by itself, not driven by the variance of the dominant "isoform 1".
Two graph to explain my problem:
Situation one:
"Gene isoform 1" and "Gene isoform 2" both increase their expression over time, however they vary synchronously, it is likely the variation of Gene isoform 2 is due to the caused by the variation of "Gene isoform 1". This is not what I want!
Situation 2:
"Gene isoform 1" and "Gene isoform 2" both increase their expression over time, the increase of "Gene isoform 2" is much faster than "Gene isoform 1", this means at least partially "Gene isoform 2" vary independent of "Gene isoform 1", and this independent part of variance is exactly what I'm interested in.
The question is, how do I test the change of "Gene isoform 2" without the influence of "Gene isoform 1".
Thanks a tons for any answer!
Another way to formulate this problem is, how to test the ratio changes across time points, with each time points have replicates.
For example, here is a example expression of the two isoforms of one gene. The data has 4 time points and each time points have 4 replicates.
How to test the ratio changes across time. A chi square test seams like do not support replicates.
Are these actually integer counts or have you rounded things? You might be able to use a beta-binomial test.
They are discrete data, but may have .5 for some reason. Could you give a little detail how to fit a beta-binomial model? I actually tried but did not figure out.
Sure, have a look at the "VGAM" package. You end up doing something like:
Betabinomial models are kind of a pain to deal with, but they're powerful. You might have to play around with some of the defaults to get better fits.
I just found this package seems like doing some straightforward test. I' m in the MPI of aging. And you? B-it I guess.
That package is new to me, but given the increased utility of betabinomial models in NGS I'm not surprised that there are packages being added (I'm slowly working on one myself).
Always nice to know some of the other bioinfo folks in the area. I'm at the DZNE (molecular and cellular cognition group) in Bonn.
Nice. Thanks a lot, you actually helped me a lot. :).
Hi Devon, how can I do actual significance test after fitting with the model? For example, in this case, test the significance across different time points.
If you just want to see if there's a time dependent effect in general, then you can also fit to a null model (~1) and then use a likelihood ratio test (it's convenient to just use the anova() function).
You mean fit with a anova model, or fit with betabinomial model and test the parameters by anova? Sorry, I'm not familiar with betabinomial, as you said, some playing around is definitely necessary.
The latter, fit with betabinomial models and then test for a difference in them with the anova function. The anova function is a convenient way to perform a likelihood ratio test between two model fits.
Hi Devon, I tried with different packages.
With ibb:
This seams dose not reach enough power, for example, something like this cannot be detected:
I also tried package VGAM and aod (easy to use). The later can easily use
anova()
to compare models.What I do like:
Does not seams like correct. Do I need to fit the BetaB model for each time point, and do likelihood ratio test with anova? Please point out if I did wrong, thank you very much again.
Since aod doesn't allow contrasts (that I know of), you can use the subsetting method as follows:
Note that
fitFull
has fitting issues. Beta binomial models are a pain to deal with, so you'll have to play around with the settings fed tooptim()
.You might also just fit the full model (not subset) and then contrast the wald statistics as needed.
BTW, out of curiosity, whose lab are you in in Cologne?