Validating Results from NMF Clustering & Consensus clustering
2
0
Entering edit mode
4.8 years ago
David_emir ▴ 490

Hi,

I am running NMF/Consensus Clustering on my cancer samples and wanted it to cluster the samples into various subgroups, my question is how to conduct cluster assessment? Can I get P-value or something like that so that I can say my clustered samples are fine and validated?

Regards,

Dave

RNA-Seq validation • 3.3k views
ADD COMMENT
3
Entering edit mode
4.8 years ago

Hey David, I was hoping for Jean-Karim Heriche or chris86 to answer, as they have more experience in clustering.

Although they calI it 'Consensus Clustering', one should still obtain a consensus on the cluster solution from other programs / metrics. Others with which I'm familiar include:

  • Jaccard index
  • M3C
  • Gap Statistic
  • Elbow method
  • Siolhouette method
  • Tree cut height (simplistic but difficult to completely dismiss it as a metric)

I have also recently been utilising Seurat's functionality for finding clusters in data. It uses a KNN (k-nearest neighbours) and Jaccard as default,

Regarding p-values, I believe Consensus Clustering has already applied some statistical validation of the clusters that it derives (?).

Kevin

ADD COMMENT
1
Entering edit mode

As mentioned by Kevin, there are many ways of scoring the quality of a clustering result and none is perfect as they generally make some assumptions about either the structure of the data and/or what a good clustering should be. In many cases, what represents a good value for the score is not always easy to assess. However they can be useful in deciding between different clusterings. Ultimately what matters is how relevant/interpretable the outcome is. For example, you may get a very good clustering by some measure but you'll find that its granularity is too fine, for example splitting what you consider should be one group into two. Ideally, you want your clustering to give you some insight into the biological question you're interested in and maybe generate some hypothesis that you can then test independently (either by looking at the data differently or by doing an experiment).

ADD REPLY
2
Entering edit mode
4.8 years ago
anbarasu.la ▴ 20

Hi David,

You can use the Sigclust (https://cran.r-project.org/web/packages/sigclust/index.html) to assess the statistical significance of your clustering. You get both simulated p-values based on empirical quantiles and on Gaussian quantiles.

If you want to compare the results from NMF and CC, you can use RandIndex (or adjusted RandIndex). It has a value between 0 and 1, with 0 indicating that the two clusterings do not agree and 1 indicating that they are exactly the same. You can use the fossil (https://cran.r-project.org/web/packages/fossil/index.html) to test the RandIndex.

Hope this helps!

Anbarasu

ADD COMMENT

Login before adding your answer.

Traffic: 1901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6