Biostar Beta. Not for public use.
How to do scaling in ComplexHeatmap
1
Entering edit mode
12 months ago

Hello everyone,

I am using edgeR and ComplexHeatmap trying to plot a Heatmap with a few ATAC-seq datasets. But I came across an issue that I can't solve. So I will really appreciate if you can give me some suggestions.

When I used log transformed CPM value to plot the Heatmap, the clustering is not very clear. The Heatmap is either all in blue or all in red.

When I used "mat_scaled = t(scale(t(data)))" to scale my data before plotting, some information can't be shown in the Heatmap. Like I expected there should be values that are the same in all the samples, which should be shown in the same color across the samples. But unfortunately, after scaling, the similar values scaled to larger differences, which show different color in the Heatmap.

                                                 sample1  sample2.   sample3
chr4_185974589_185974741       1.483681 1.472528    1.4296474
 after scaling
                                                  sample1             sample2.            sample3
chr4_185974589_185974741    0.761687321 0.37073755  -1.132424873

Thanks.

ADD COMMENTlink
0
Entering edit mode

It might be easier for us to assist you if you posted the example images and the corresponding code. For example, it is not clear if you want to scale columns or rows or both.

ADD REPLYlink
0
Entering edit mode

Here is the code for scaled heatmap

library(ComplexHeatmap)
library(circlize)
base_mean = rowMeans(data)
mat_scaled = t(scale(t(data)))
type = gsub("s\\d+_", "", colnames(data))
ha = HeatmapAnnotation(df = data.frame(type = type))
Heatmap(mat_scaled , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 2), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

enter image description here

And unscaled

Heatmap(data , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 5), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

enter image description here

Thanks.

ADD REPLYlink
0
Entering edit mode

And what exactly is the detail you don't like about the z-score? Which values do you think should be "the same"?

From what I can tell, your code does what you instructed it to do -- you're z-score-transforming the rows of your matrix, i.e. instead of displaying the actual values of data, you're coloring the heatmap based on the distance of each entry to its row's mean.

Btw, I would strongly recommend to not create an object named "type" because that's also the name of a base R function.

ADD REPLYlink
0
Entering edit mode

I got some values changed after scaling like this.

                               sample1    sample2.    sample3
chr4_185974589_185974741       1.483681   1.472528    1.4296474

after scaling

                               sample1        sample2.        sample3
chr4_185974589_185974741       0.761687321    0.37073755      -1.132424873

Peaks like this not are not changed across the samples but after scaling, it showed differences.

ADD REPLYlink
0
Entering edit mode

I expected there should be some common regions (similar or same values) across my samples showing as the same color in a cluster.

ADD REPLYlink
0
Entering edit mode

Check out the formula below. The z-score is going to drastically reduce the influence of the dynamic range differences between individual rows, therefore small differences in a row with overall small values may get similar z-scores as differences that seem larger to you just because the numbers that are compared to each other live on a different scale. But relatively speaking, the differences from the mean may not be as dramatic (or similarly dramatic in the low-value-ranges).

ADD REPLYlink
0
Entering edit mode

Hello sophialovechan,

You have added multiple images improperly, hence they show up as links and not as embedded images. Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here).

I will make the necessary changes for now.

ADD REPLYlink
2
Entering edit mode
12 months ago
United States

Ah, I see it now. I had ignored the numbers before, I now formatted them in your original post so that it's a bit more obvious what you're actually asking. Your code does what it should; you may not like the consequences, but that's a different issue.

Using the pheatmap:::scale_rows function may illuminate what's going on:

## this is what the function does
> pheatmap:::scale_rows
function (x) 
{
    m = apply(x, 1, mean, na.rm = T)
    s = apply(x, 1, sd, na.rm = T)
    return((x - m)/s)
}

## and this is the result
> matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% pheatmap:::scale_rows()
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

## which is the same as your code
>  matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% t %>% scale %>%  t
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

As explained e.g. in wikipedia, the z-score is calculated by subtracting the (column or row mean) from the given value and dividing that by the standard deviation.

The mean of your 3 example values is 1.46, the sd is 0.029, so do the math yourself and you can see that the code is doing what it's supposed to be doing.

ADD COMMENTlink
0
Entering edit mode

Thanks for your detailed explanation. I understand the code is doing what it is supposed to do. So I guess the question I want to ask is if there is a way to present the data more close to the original data. I guess I should plot the original data in the Heatmap but I can't get the color to show the difference, as shown in the unscaled Heatmap in my previous reply.

ADD REPLYlink
0
Entering edit mode

I cannot follow. Which differences do you find worthy of being "shown"? There's a clear difference in the second cluster, for example (one blue, two red).

Just a couple of thoughts:

  • are your values log-transformed? If not, that might help.
  • note that your current legend label "z-score" is wrong for the unscaled heatmap
  • maybe you're looking to adjust the color scheme? Kamil's pheatmap tutorial has a nice section about coloring according to quantiles; the principles should work with complexHeatmap, too, I would think
ADD REPLYlink
0
Entering edit mode

Yes, the value is log-transformed. I don't have problems with z-score scale. I think z-score scale show the difference very well. But I want to show the similarity as well, like the genes/values are similar or equal in all the samples.

ADD REPLYlink
0
Entering edit mode

Sorry, but I'm not sure what you're asking for now. Maybe you can manually draw a version the way you envision it?

ADD REPLYlink
0
Entering edit mode

something like this:

Desired Heatmap Style

you can see a cluster (cluster2) which have the same or similar values across the samples.

That's what I want to make. Thank you very much.

ADD REPLYlink
0
Entering edit mode

you're unscaled version has that, no?

ADD REPLYlink
0
Entering edit mode

I have a feeling OP wants column clustering (as opposed to the row clustering shown in their cluster2 example)

ADD REPLYlink
0
Entering edit mode

Clustering is happening at both levels in the example heatmaps shown in the original post. But maybe sophialovechan is asking for the clustering being based on the unscaled data while the colors should correspond to the z-score-transformed values?

ADD REPLYlink
0
Entering edit mode

Yes, It looks weird when OP picks the limits for colors themselves (c(-2.0.2)/c(-2,0,5) for low, mid, high) - ideally, they should be done using min(), mean() and max() (sort of like scaling without the actual scaling).

ADD REPLYlink
0
Entering edit mode

Yes. That's exactly what I want to do. I am not very good at coding so not sure how to set the color using min(), mean() and max(). And I mislabeled the heatmap without scaling with z-score. Sorry about the confusion.

ADD REPLYlink
2
Entering edit mode
12 weeks ago
RamRS 21k
Houston, TX

Change

colorRamp2(c(-2,0,5), ...

in the unscaled version to

colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T)), ...

That replaces the hard-coded values with values computed on the fly.

ADD COMMENTlink
0
Entering edit mode

Thank you very much!

ADD REPLYlink
0
Entering edit mode
> Heatmap(data, name="log(CPM)", km=5, col=colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T),c("blue", "white", "red")), bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE))

Error in colorRamp2(c(min(data, na.rm = T), mean(data, na.rm = T), max(data,  : 
unused arguments (bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

I got an error though.

ADD REPLYlink
0
Entering edit mode

Check your parentheses.

ADD REPLYlink
0
Entering edit mode

Problem solved. Thanks.

ADD REPLYlink
0
Entering edit mode

Please remember to accept the answer that helped solve your problem.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1