I am curious if anyone has good ideas for this problem I have.
I have a starting data set of ~4000 genes and 113 experimental conditions (data_frame of 'data'). I also have four predicted regulons made up of genes from the starting data set (26 genes ('a'), 16 genes ('b'), 16 genes ('c'), and 6 genes ('d')). My idea was that if the predicted regulons show a good correlation of expression with each other across all the experiments, they are more likely to be real than those genes in a predicted regulon with a weaker correlation.
What I have done is compute the correlation in R:
data.cor <- cor(data)
a.cor <- cor(a)
b.cor <- cor(b)
c.cor <- cor(c)
d.cor <- cor(d)
and then used a t.test to compare the correlations of the predicted regulons to the overall data, with the idea that if the predicted regulons are statistically different than the overall data, they are more likely to be real:
t.test(data.cor, a.cor)
This provides p.values very much below 0.05. However, I am concerned that this is likely due to comparing two groups of very different sizes (~4000 rows in 'data' vs 26 rows in 'a' or 6 in 'd').
Can anyone recommend a better way to compare these groups in R? Are the t.test results reliable? I've done Wilcoxon tests too and gotten the same results. Any help or advice would be greatly appreciated!
Thanks!
Thanks.
You are interpreting my data correctly, yes. Is there a better way to compare two matrices?
At the moment you're comparing the average correlation of the two groups which is simple but may be not entirely satisfying because for large sample sizes even small differences become significant. So unless there is a large difference, the p-value is, in my view, meaningless in terms of biological relevance. I think you can use Mantel's test by comparing a matrix of regulon assignment (i.e. using a 0/1 coding) to the correlation matrix. The question this answers is: are genes in the same regulon also similar in terms of their expression pattern ? Another approach could be to take a feature selection approach to find out which experimental conditions are good indicators/predictors of regulon membership.
Thanks! I hadn't heard of Mentel's test. I've tried but need to read more to understand the results. I compared the matrix of all data and the matrix of the individual regulon and got a significants of 0.001 and a statistic r: 0.9113. I'll read more, but thanks for your help!