Unsupervised selection of inter-cluster highly variable genes
1
0
Entering edit mode
5.5 years ago
elb ▴ 250

Hi guys,

I have a big data.frame of RNA-Seq counts in which rows are genes while columns are samples.

I clustered this big matrix and I identified 6 major clusters. They have some common genes, i.e. genes that do not show a huge variation between the samples (around 100 patients) and some genes that characterize each cluster because the expression is different between the clusters. For example: in one cluster 10 genes are highly expressed while in all the other clusters the same genes are poorly expressed and do not change substantially comparing to the first cluster. Is there a way to select the highly "significant" or variable genes that characterize each cluster with respect to the others in order to end up with a list of cluster-specific genes whose expression is peculiar of that cluster? I know that a way is to perform a log2 (fold change) but I would like to perform this analysis in an unsupervised way without to select the comparisons for the fold change calculation. Can anyone help me with some idea or references so that I can select the cluster-specific relevant genes?

Thank you in advance

e.

R RNA-Seq variance • 1.3k views
ADD COMMENT
2
Entering edit mode
5.5 years ago

You are looking for genes that characterise each of your sample clusters. I published a recent work where I did this with metabolomics data. Clusters were first identfied via Partitioning Around Medoids (PAM) and high/low metabolites in each identified via the medoid values returned for each gene in each cluster. I later showed that the most interesting genes did indeed differ [statistically] between each cluster via ANOVA.

In your case, you have neither elaborated on your clustering mechanism nor on how you identified 6 major clusters. One thing that you may consider is transforming your data to Z-scores and then taking Z > +2 as high expression in a particular cluster and Z < -3 as low expression in a particular cluster. In this way, you can define a set of genes that characterise each cluster.

Kevin

ADD COMMENT
1
Entering edit mode

Thank you Kevin for your answer. I simply normalized my data and then performed an unsupervised HCA with Pearson correlation as a measure of distance. I have no reference samples. Then the clusters appeared. I appreciate a lot your work. I think I could be inspired by it. Thank you a lot.

ADD REPLY
0
Entering edit mode

Grazie - prego / You're welcome.

ADD REPLY

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6