Biostar Beta. Not for public use.
regression model to use to show the difference
Entering edit mode
14 months ago
krushnach80 • 500

I have wild type and knockout condition, after knockout the level of a certain metabolite goes up, there is difference, as seen in the phenotype,so my question is what kind of regression model to use or any other method to show the difference any suggestion or help would be appreciated

R • 620 views
Entering edit mode

If you have two groups, have you considered a t-test?

Entering edit mode
WT   Amo                   GS   Amo
6.92    461.333           6.12  408.000
6.9 460.000         6.98    465.333
18.8    1253.333              12.69 846.000
18.75   1250.000            10.8    720.000
33.36   2224.000        11.2    746.667
21.55   1436.667        11.82   788.000
21.95   1463.333        22.96   1530.667
11.54   769.333     28.41   1894.000
5.22    348.000     47.7    3180.000
16.1    1073.333        3.28    218.667
13.41   894.000     14.2    946.667
31  2066.667        17  1133.333
55  3666.667        25  1666.667
53.4    3560.000        40.2    2680.000
                           53   3533.333
                           41   2733.333

my data is something like this...i mean my number of observation in WT is more than in knockout...would you suggest me to go for t test...

Entering edit mode

There is no need for the two groups to be the same size for a t-test.

Entering edit mode

okay but these are not independent upto my understanding because for the im taking the knock out of the same gene which im studying am i correct, if yes then i shall go for Paired t-Test isn;t it?

Entering edit mode
13 months ago
Republic of Ireland

Hello friend,

Assuming that your metabolites have been normalised to the Z-scale and/or are logged (and thus follow a normal distribution), you can just run a binary logistic regression model:

First, get your data in this format:

           Group   Metab1  Metab2 Metab3 Metab3
Sample 1   WT      11.39   10.62   9.75  10.34
Sample 2   WT      10.16    8.63   8.68   9.08
Sample 3   WT       9.29   10.24   9.89  10.11
Sample 4   KO      11.53    9.22   9.35   9.13
Sample 5   KO       8.35   10.62  10.25  10.01
Sample 6   KO      11.71   10.43   8.87   9.44

Then, set your Group variable as factors and specify WT as the reference level:

MyData$Group <- factor(MyData$Group, levels=c("WT","KO"))

Then, I would check each metabolite independently in the logistic regression modelling:

glm(Group ~ Metab1, family="binomial")
glm(Group ~ Metab2, family="binomial")
et cetera

Model p-values, estimates (indicates which way the metabolite expression goes in KO vs WT) / coefficients can be extracted via the summary() funcion applied to the model object. You can also perform Chi-squared ANOVA via anova(MyModel, test="Chisq")

You can set this up as a loop: Question about generalized linear model fitting


If your aim is to identify a panel of predictors, then, from the results of the above, select the metabolites that are statistically significant and then you will have to perform further test statistics on these to gauge their 'predictive' strength. For example, see:

You can also just do a penalised regression with all metabolites at the same time using the lasso, elastic-net, or ridge penalty: A: How to exclude some of breast cancer subtypes just by looking at gene expressio


Entering edit mode

kevin thank you very much i was looking for this for my other really really glad that you posted this ...

Entering edit mode

@Kevin can i use your method for the gene expression or differentially expressed genes so the only thing i need is to model my data as you have mentioned ?

Entering edit mode

Yes, you can use this same approach for the genes that are differentialy expressed so that you can further reduce the number of genes in your final model.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1