Hi
I have 60 individuals randomized to treatment with a statin or diet. Statins are drugs that inhibit the liver enzyme HMG-CoA-reductase, with inhibits the synthesis of cholesterol (a major driver of atherosclerosis) leading to lower risk of stroke and myocardial infarction.
Thus 60 individuals are randomized to either a statin or diet. 10 genes were quantified at baseline and after 1 year treatment. For each gene we have a baseline value, 1 year value and the fold (calculated as the 1 year value divided by the baseline value).
The data set looks like (presenting two genes with baseline, 1 year and fold):
genes <- read.table(header=TRUE, sep=";", text =
"treatment;IL10_BL;IL10_1Y;IL10_fold;IL6_BL;IL6_1Y;IL6_fold;
diet;1.1;1.5;1.4;1.4;1.4;1.1;
statin;2.5;3.3;1.3;2.7;3.1;1.1;
statin;3.2;4.0;1.3;1.5;1.6;1.1;
diet;3.8;4.4;1.2;3.0;2.9;0.9;
statin;1.1;3.1;2.8;1.0;1.0;1.0;
diet;3.0;6.0;2.0;2.0;1.0;0.5;")
Nothing has been done to the data (e.g log-transform).
I have the following questions:
- How would you calculate if there are differences in gene expression after 1 year in the diet vs statin group?
- Does change in gene expression predict other variables (e.g blood cholesterol); this would need multiple regression analyses, in which I wonder what gene value to use (the baseline, the 1 year of the fold). Should I in that case log-transform the fold/baseline/1year before using it as a predictor?
Any help is much appreciated.
I have read the limma userguide package but my data is not set up in the way limma package uses in the vignette.
Thanks in advanced.
I have still not found out how to solve the issue... unfortunately.
Anyone who can provide some assistance?
Thanks in advance
Where are you stuck now? BTW, you should really consider just collaborating with a local bioinformatician/statistician, since even once you're able to get everything to work I expect you won't really know how to QC everything.
I have made an effort to read the user guide accompanying limma package, before posting here again.
My data set is set up the same way, as the "eset" object in my reply below. Therefore:
I have tried a range of model matrices and subsequently analyze if there are differential gene expressions between the treatment groups. None of them have worked. I do not argue that my efforts are qualitative, since I'm not used to analyze gene data. I have no biostatician in my vicinity to ask.
I'd like to ask for some coding support here? I have the above data set and wish to examine if gene expression differs between the groups.
I would appreciate it immensely.
(1) Delete the first row. Put you column:group assignments in a different file (put them in the same order, of course). (2) What model matrices have you tried and what exactly do you mean by "none of them have worked"?
(1) done that now.
(2) A somewhat comparable example in the vignette where African males and females are compared uses the following matrix:
I did not figure out how to apply this to my data.
I also tried:
and several other similar designs.
The problem is that
(a) I don't know what matrix to design, and
(b) how to use the expression data set, the list created in (1) and the design matrix. Because that gives me three objects to work with, while the vignette always includes a design matrix and data.
Regards
The list of sample associations is used by model.matrix(). In short, you make a dataframe whose rows are samples and columns are parameters (diet, individual, etc.). Unless you know what you're doing, which would require having taken linear algebra at some point in your education, it's probably not a good idea to make random model matrices by hand (i.e., use model.matrix()). Your design is just ~diet or ~statin, depending on what you want to call things. If you look at section 9.7 of the users guide, you'll see an example of an experiment similar to (but more complicated than) yours. In this example, the dataframe or vector created in step (1) would be
targets
. You'd use that to create the design matrix.