Hello, I am looking at this heatmap and I do not understand why some of the tumours are grouped with the controls, seems as if the heatmap is 'moved to the left': Top of the heatmap:

(it is a long heatmap so i upload just a section)

Bottom of the heatmap:

I tried to check on the phenotype of the samples but I found no correlations of the phenotype with this grouping.

There seem to be two tumour subgroups: one that is the biggest, and the small group of tumors that seem to be grouped with the controls. I did a t-test on the mean beta values for all the cpgs between these two groups, and it turns out there is significance among the means of these two tumour groups. I am afraid when we publish a heatmap similar to this one, we will have trouble explaining this phenomenon. Any ideas or any opinions on this? Thank you!

Thanks a lot, kevin!

For scaling, I tried many ways and got the same for all: -Converting to M values and then using parameter scale='row'; -Using scale='row' on b values; -Using scale='none' but scaling before the heatmap.

All look similar, so I think I did it correctly. I will check on distance and linkage metrics, as I used the default ones. And thank you for the test input, I now realized I should have used a non parametric test.

When I previously did this, the Wilcoxon Signed Rank test p-value, combined with an extra cut-off for difference in mean β, were enough to adequately separate my groups of interest. The heatmap / clustering was then performed on unscaled β values:

I tried with :

and here is the heatmap I got: seems much better, should I stick with this one?

I was more interested in just learning about which metrics you were currently using. I would not use a metric that I did not understand, and obviously it is not good practice to just choose the metric that makes the data look better.

My usual default (for most data-types) is either of:

Euclidean distancewithWard's linkage (ward.D2)1 minus Pearson/Spearman correlation distancewithWard's linkage (ward.D2)Yes, I agree. I was using default parameters, which were "complete" and "euclidean". I tried euclidean with ward.D2 and still, the differentiation between tumor and controls is not clear, it is a blur in the middle of both, and some tumors are the same color as controls as in my original heatmap.

...but this may not necessarily be a problem, i.e., it may be the genuine result. Biology is much more complex than we can currently comprehend with our analytical methods. Every time that we take a sample and put it through our instruments, we are only looking at a 'snapshot' / moment in the evolution of the tissue/cell that is being studied, and much information is automatically eliminated because our very analytical methods are limited in what they can show.

So, if you cannot identify any issues with your coding, then it is the genuine result given the data that has been obtained.

I should add that you need to both filter by p-value

the difference in mean between tumour and normal.and