Question

ConsensusClusterPlus for small sample

0

Entering edit mode

5.3 years ago

Omics data mining ▴ 260

Hello everyone

I want to identify subgroups in one of cancer dataset. Before using it, I have few questions :

1) What are minimum sample size required to run ConsensusClusterPlus . I have data of 19 samples . Shall I use it for clustering or I should go for other method (eg PCA).

2) While going through manual, I found it will take input data of expression values (normalised or unnormalised ?? ). I also have z score (from expression data) obtained from another analysis for 19 samples. Can I use z score directly and perform clustering using ConsensusClusterPlus software.

I will appreciate all suggestions.

Thanks

ConsensusClusterPlus • 2.4k views

ADD COMMENT • link updated 5.1 years ago by chris86 ▴ 400 • written 5.3 years ago by Omics data mining ▴ 260

score 0 · Answer 1 · 2019-01-03

0

Entering edit mode

5.3 years ago

Ahill ★ 1.9k

You can use consensus clustering on 19 samples - there is no intrinsic minimum sample size required. Typically, for this and other clustering methods the results will be very dependent on how you select informative genes - see the ConsensusClusterPlus manual for one gene selection approach of picking the most variable genes by MAD. Data should be normalized. Z scores might be OK, but very dependent on how they were computed (sample-wise, or gene-wise?). If you are using a typical expression readout (like normalized read counts or intensities from RNA-Seq), then using those normalized expression levels (not the Z-scores) for an informative subset of the genes with a Pearson correlation distance measure is probably a good place to start to look for sample groupings.

ADD COMMENT • link 5.3 years ago by Ahill ★ 1.9k

0

Entering edit mode

Hi Ahill

Thanks for your valuable answer.

I will go with MAD approach for variable selection. As z score were predicted sample Wise. First, I will start with normalised intensities and will try to get results. Then will try get subgrouping with z score and later on will see how common results are coming from both approach. I have bit problem in understanding the plots and results. I just run sample data in the clusterconsesusplus and got Following results.

k cluster clusterConsensus

2 1 0.90794831578128

2 2 0.758432628514517

3 1 0.624620046443652

3 2 0.911135863955618

3 3 0.986412256470072

4 1 0.890835574988102

4 2 0.886960582630877

4 3 0.666394932640416

4 4 0.98295225849986

5 1 0.86123474251129

5 2 0.884872156152216

5 3 0.556828374192177

5 4 0.839098318290865

5 5 1

6 1 0.825649752799388

6 2 0.937773728911312

6 3 0.649644539921365

6 4 0.726792776419238

6 5 0.698201730147844

6 6 1

How to decide the k and sample membeship based on the clutserconsensus values. Is there need to fix any threshold and then choose specific k.

Similarly, how to decide the item membership based on this results.

k cluster item itemConsensus

1 2 1 28031 0.5002183

2 2 1 28003 0.4185504

3 2 1 28042 0.4727976

4 2 1 43012 0.5462791

5 2 1 LAL5 0.4682668

6 2 1 08018 0.5090733

7 2 1 57001 0.5897417

8 2 1 22010 0.5834408

9 2 1 01007 0.2090324

10 2 1 01003 0.2036311

Thanks in advance A

ADD REPLY • link 5.3 years ago by Omics data mining ▴ 260

score 0 · Answer 2 · 2019-03-21

I wouldn't bother consensus clustering with really small sample sizes like that because your statistical power is poor, IMO better to use standard hierarchical clustering with aheatmap or complexheatmap.

I usually consensus cluster with large samples sizes like at least over 60. I think the resampling samples method the Monti algorithm uses isn't going to behave well until you have a larger number of samples.