will different proportion of control/patient samples affect gene's Pearson correlation?
1
0
Entering edit mode
6.2 years ago
hellocita ▴ 40

I have rna-seq data that were from different ages(10, 20, 30, 40, 50 year-old) in 50 control and 14 patients. And based on differential analysis I found some differential genes across age. I want to divide genes into several cluster by using their pearson correlation r for hierarchical clustering, and in each cluster genes should have similar pattern across age, for instance, in control, genes in a cluster were highest at young age, while in patient, it's highest in old ages.

however, there is only a few samples at young ages, and patient sample size is much less than control. I find if I first calculate the mean of each age both in control and in patient, and do clustering based on gene's correlation, the pearson r is different from clustering based on gene's correlation from all samples. will the different size of control and patients, and different size of ages affect the correctness of pearson correlation?

hierarchical clustering pearson correlation • 1.5k views
ADD COMMENT
1
Entering edit mode
6.2 years ago

Hello Lucy, I do not completely understand your final paragraph. However, differences in sample numbers will definitely affect the correlation statistic.

If you are aiming to look for 'patterns' in the age groups based on correlation, then tools already exist. These involve the construction of a square correlation matrix, which is then used as the founding stone for network analysis. In a square correlation matrix, each sample is correlated to every other sample:

ADD COMMENT
0
Entering edit mode

Thank you Kevin! However i am not sure whether I can use WGCNA, because it may be first calculate gene module by correlation based on control sample, so I think it may not reflect what really happened in disease sample, disease sample module should be different from control module I think.

ADD REPLY
1
Entering edit mode

Okay, why not generate one network for controls and the other for disease? Network analysis, generally, has major flaws. I believe that it still has to prove its value as a robust method that can help us to disentangle disease mechanisms.

ADD REPLY

Login before adding your answer.

Traffic: 2609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6