Hi everyone, I want to identify differentially expressed genes from a dataset I got from NCBI GEO. The data is composed of 14 .CEL samples. 8 sample belongs to the diseased patients, the rest is control group.
I have processed the raw data in R. Saved it in a .csv file.
Now I want to use t-test to test for a difference in means with all of the genes (control and diseased) using the apply() function.
My processed data table is kind of messed up, meaning my control samples are in the following columns: "1,8,9,10,12,14" ...and while my diseased samples are on these columns: "2,3,4,5,6,7,11,13"
> colnames(processedData)
[1] "GSM596529_820B.CEL" "GSM596530_820C.CEL" "GSM596531_820D.CEL" "GSM596532_820E.CEL" "GSM596533_820F.CEL" "GSM596534_874A.CEL"
[7] "GSM596535_874B.CEL" "GSM596536_874C.CEL" "GSM596537_874D.CEL" "GSM596538_874E.CEL" "GSM596539_874F.CEL" "GSM596540_874G.CEL"
[13] "GSM596541_874H.CEL" "GSM596542_894B.CEL"
(what I mean by this that control group and diseased group are not side-by-side on columns.
Because of this they make it hard for me to select them in the x function)
I entered the code below in order to run t-test between two groups (1,8:10,12,14 and 2:7,11,13 == control and diseased group respectively). But, I think I am typing this wrong in the curly bracket.
> p_value_all_genes = apply(processedData, 1, function(x){ t.test(x[1,8:10,12,14], x[2:7,11,13]) $p.value } )
Error in x[1, 8:10, 12, 14] : incorrect number of dimensions
Any suggestions on what I should write inside the curly bracket to select my control group samples and diseased group samples? Or am I doing this apply() function completely wrong?
Even when you can do the test by yourself, it's better if you use the tools already implement for that, check limma and affy packages.