Question

Heatmap Using R, With Special Conditions !!!

2

Entering edit mode

11.5 years ago

RDS ▴ 20

This dataset is for a specific disease-gene-test results. The dataset goes like this.

Test and Gene are the two parameters on x and y axis. Values are the combined results for the parameters pair. The problem for me here is that I have one more parameter called Relevance which represents the relevance of the test-gene pair and it is boolean (only two values YES/NO).

The dataset (Relevance) should be differentiated in the map with different colour (like red and green).
Gradient of that colour represents numerical values between that interaction.

The end result I was aiming at was, Test on x-axis and Gene on y-axis and the map for these interaction will be only with two colours (representing Relevance values) and the gradient of that colour representing Values. Is this possible to achieve this kind of Heatmap, if Yes how can I achieve. If not is there any other option to display such kind of data (similar to heatmap)

Help appreciated !!

Thanks,

RDS

something like this -

r heatmap • 18k views

ADD COMMENT • link updated 5.2 years ago by Biostar 20 • written 11.5 years ago by RDS ▴ 20

5

Entering edit mode

In my lab, 3 people out of 60 are color-blind. This figure would be unintelligible for them.

ADD REPLY • link 11.5 years ago by Giovanni M Dall'Olio 28k

4

Entering edit mode

ColorBrewer (http://colorbrewer2.org) is an excellent resource for choosing color schemes for scientific data, and addresses issues like color blindness.

ADD REPLY • link 11.5 years ago by mhowison ▴ 40

3

Entering edit mode

I'm red/green blind and I can perfectly interpret the figure.

ADD REPLY • link 9.7 years ago by llukas.kkohl ▴ 30

1

Entering edit mode

11.5 years ago

Alex Reynolds 35k

I tried using lattice in R to do roughly the same thing and I got close using latticeplot routines. It is probably better to use ggplot2 for this task.

ADD COMMENT • link 11.5 years ago by Alex Reynolds 35k

score 3 · Accepted Answer · 2012-11-05

Here is some R code that may help. I must admit I do not understand exactly how your Relevance and Value variables are related (or expected to interact). I have made some guesses. If I have guessed wrong, perhaps you can post some sample data to help clear up my confusion?

I have used scale_fill_gradient2 in these examples. You can specify three different colors: high, mid (default is white), low. You can specify the midpoint (value that maps to mid color, default=0), and upper and lower value limits. This may provide enough flexibility to show your data the way you want it.

In Example 1, negative values range from red to white, and positive values range from white to blue. In Example 2, where relevance is FALSE, no color is plotted and where relevance is TRUE, the color gradient spans the range red-white-blue.

library(ggplot2)

# Create test data.
dat1 = data.frame(x=factor(rep(c("A", "B", "C"), 3)), 
                  y=factor(rep(c(37, 8.7, -17.7), c(3, 3, 3))), 
                  z=c(34, 18, 31, 9, -2, 4, -21, -33, -13))

p1 = ggplot(dat1, aes(x=x, y=y, fill=z)) +
     theme_bw() +
     geom_tile() +
     geom_text(aes(label=paste(z))) +
     scale_fill_gradient2(midpoint=0, low="#B2182B", high="#2166AC") +
     opts(title="Example 1")

ggsave(plot=p1, filename="plot_1.png", height=4.5, width=5)

# Create a slightly different test dataset.
dat2 = data.frame(
         gene=factor(rep(c("Gene_A", "Gene_B", "Gene_C"), 3)), 
         test=factor(rep(c("Test_1", "Test_2", "Test_3"), c(3, 3, 3))),
         relevance=c(TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE), 
         value=c(-16, 3, NA, NA, -13, 25, -4, NA, -26))

p2 = ggplot(dat2, aes(x=gene, y=test, fill=value)) +
     theme_bw() +
     geom_tile() +
     geom_text(aes(label=paste(value))) +
     scale_fill_gradient2(midpoint=0, low="#B2182B", high="#2166AC") +
     opts(title="Example 2")

ggsave(plot=p2, filename="plot_2.png", height=4.5, width=5)

score 2 · Accepted Answer · 2012-11-05

2

Entering edit mode

11.5 years ago

Sukhi Singh 11k

I am sure you can do it in R, a little bit fancy using the ggplot2 library and having your data in the form of dataframe. Check these two posts on how to achieve it. ggplot2-quick-heatmap-plotting and Constructing an Heatmap of "Distance of binding region relative to TSS".

Cheers

ADD COMMENT • link 11.5 years ago by Sukhi Singh 11k

0

Entering edit mode

Thanks for the response, but that's not what I was looking at. Let me explain - In the links which you've provided, the colour differentiation is done based on the data relative to just one axis. That means in this image http://i.stack.imgur.com/dPAE2.png colours (red, green and blue) are relative to data on y-axis (these colours are part of data which represents x-axis).

But the data set which I have is different, let me explain this. | Test | Gene | Relevance | Values | Relevance is dataset which represents both test and gene pair. On certain conditions for a test/gene pair relevance may be Yes/No. (It's not a part of any axis). I hope you've understood the problem. Appreciates, RDS

ADD REPLY • link 11.5 years ago by RDS ▴ 20

0

Entering edit mode

So, you want to have this heatmap but the color should be a representative of the boolean in Relevance. Is this is correct, then just point the fill variable of geom_tile to Relevance, after melting instead of rescale and if you want numbers on top, then you will have to use geom_text in addition. :)

ADD REPLY • link 11.5 years ago by Sukhi Singh 11k

0

Entering edit mode

Yeah, I can get colours as you said "fill variable of geomtile to Relevance". I am fine till here but, how can I use the fourth data point (of this | Test | Gene | Relevance | Values |) VALUES ? not as text (using geomtext). How can I make the gradient of those colours using VALUES data points ?

RDS

ADD REPLY • link 11.5 years ago by RDS ▴ 20

0

Entering edit mode

Aahh, then you might have to do some tweaking, Assign -ve values to the elements with Relevance=No and +ve to the Relevance=Yes, and then fill using values, more the -ve, the more its not relevant. So, make the subset, where the Relevance=No, add - to the value column and then plot. After that, generate the gradient as described here. How to do gradient

ADD REPLY • link 11.5 years ago by Sukhi Singh 11k