Question

regression model to use to show the difference

0

Entering edit mode

6.1 years ago

1769mkc ★ 1.2k

I have wild type and knockout condition, after knockout the level of a certain metabolite goes up, there is difference, as seen in the phenotype,so my question is what kind of regression model to use or any other method to show the difference any suggestion or help would be appreciated

R • 1.4k views

ADD COMMENT • link updated 6.1 years ago by Kevin Blighe 87k • written 6.1 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

If you have two groups, have you considered a t-test?

ADD REPLY • link 6.1 years ago by Sean Davis 26k

0

Entering edit mode

WT   Amo                   GS   Amo
6.92    461.333           6.12  408.000
6.9 460.000         6.98    465.333
18.8    1253.333              12.69 846.000
18.75   1250.000            10.8    720.000
33.36   2224.000        11.2    746.667
21.55   1436.667        11.82   788.000
21.95   1463.333        22.96   1530.667
11.54   769.333     28.41   1894.000
5.22    348.000     47.7    3180.000
16.1    1073.333        3.28    218.667
13.41   894.000     14.2    946.667
31  2066.667        17  1133.333
55  3666.667        25  1666.667
53.4    3560.000        40.2    2680.000
                           53   3533.333
                           41   2733.333

my data is something like this...i mean my number of observation in WT is more than in knockout...would you suggest me to go for t test...

ADD REPLY • link 6.1 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

There is no need for the two groups to be the same size for a t-test.

ADD REPLY • link 6.1 years ago by Sean Davis 26k

0

Entering edit mode

okay but these are not independent upto my understanding because for the im taking the knock out of the same gene which im studying am i correct, if yes then i shall go for Paired t-Test isn;t it?

ADD REPLY • link 6.1 years ago by 1769mkc ★ 1.2k

score 7 · Accepted Answer · 2018-03-04

Hello friend,

Assuming that your metabolites have been normalised to the Z-scale and/or are logged (and thus follow a normal distribution), you can just run a binary logistic regression model:

First, get your data in this format:

MyData
           Group   Metab1  Metab2 Metab3 Metab3
Sample 1   WT      11.39   10.62   9.75  10.34
Sample 2   WT      10.16    8.63   8.68   9.08
Sample 3   WT       9.29   10.24   9.89  10.11
Sample 4   KO      11.53    9.22   9.35   9.13
Sample 5   KO       8.35   10.62  10.25  10.01
Sample 6   KO      11.71   10.43   8.87   9.44
...

Then, set your Group variable as factors and specify WT as the reference level:

MyData$Group <- factor(MyData$Group, levels=c("WT","KO"))

Then, I would check each metabolite independently in the logistic regression modelling:

glm(Group ~ Metab1, family="binomial")
glm(Group ~ Metab2, family="binomial")
et cetera

Model p-values, estimates (indicates which way the metabolite expression goes in KO vs WT) / coefficients can be extracted via the summary() funcion applied to the model object. You can also perform Chi-squared ANOVA via anova(MyModel, test="Chisq")

You can set this up as a loop: Question about generalized linear model fitting

---------------------------------

If your aim is to identify a panel of predictors, then, from the results of the above, select the metabolites that are statistically significant and then you will have to perform further test statistics on these to gauge their 'predictive' strength. For example, see:

Lecture 3 - RTrainingLect3.pptx on my GitHub page
A: Resources for gene signature creation

You can also just do a penalised regression with all metabolites at the same time using the lasso, elastic-net, or ridge penalty: A: How to exclude some of breast cancer subtypes just by looking at gene expressio

Kevin