Question

How to interpret heatmap ?

0

Entering edit mode

6.2 years ago

Björn ▴ 110

The heatmap command

library(gplots)
heatmap.2(lcpm, col=redgreen(10))

gave following graph. How to interpret color key histogram ? Why there are two bright green bands on top and bottom, what it signifies ?

pheatmap lcpm edgeR • 6.1k views

ADD COMMENT • link updated 6.2 years ago by Nicolas Rosewick 10k • written 6.2 years ago by Björn ▴ 110

1

Entering edit mode

You can also use the scale="row" heatmap.2 parameter.

ADD REPLY • link 6.2 years ago by h.mon 35k

0

Entering edit mode

Additional question on the heatmap. how to get the real gene names (row names) instead of numbers ?

ADD REPLY • link 6.2 years ago by Björn ▴ 110

score 3 · Answer 1 · 2018-02-27

3

Entering edit mode

6.2 years ago

Kevin Blighe 87k

This does not look right at all. The count distribution via the key and histogram is biased toward high counts (green), for whatever reason. Scaling this distribution to be more evenly distributed will assist visualisation, and also setting colour break-points accordingly.

Can you please follow my simple approach here, and see what you get: A: Heatmap based with FPKM values

Kevin

ADD COMMENT • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@kevin. The data are mirna received as single ends. I fed raw counts of mirna in edgeR for DE analysis. Do you have suggestions for such data ?

ADD REPLY • link 6.2 years ago by Björn ▴ 110

0

Entering edit mode

Hi Björn,

Yes, you definitely need to scale them, as per the lines of code in the link that I gave (above), i.e., literally, execute this:

require("RColorBrewer")
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-3, 3, length.out=101)

#Scale the values to the Z scale
heat <- t(scale(t(lcpm)))


#Generate heatmaps with Euclidean distance (first) and '1 - Pearson correlation' distance (second) (both use Ward's linkage)
require("gplots")

#Euclidean distance
heatmap.2(heat, col=myCol, breaks=myBreaks, main="Title", key=T, keysize=1.0, scale="none", density.info="none", reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean), trace="none", cexRow=0.2, cexCol=0.8, distfun=function(x) dist(x, method="euclidean"), hclustfun=function(x) hclust(x, method="ward.D2"))

#1-cor distance
heatmap.2(heat, col=myCol, breaks=myBreaks, main="Title", key=T, keysize=1.0, scale="none", density.info="none", reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean), trace="none", cexRow=0.2, cexCol=0.8, distfun=function(x) as.dist(1-cor(t(x))), hclustfun=function(x) hclust(x, method="ward.D2"))

-------------

Regarding rownames, you can usually set these with the labRow parameter of heatmap.2(). For exampl:

heatmap2(..., labRow=rownames(lcpm),...)

This obviously requires that the rownames of your lcpm object are indeed gene names. You can check by just running rownames(lcpm)

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

score 2 · Answer 2 · 2018-02-27

2

Entering edit mode

6.2 years ago

seidel 11k

The heatmap.2() function does not perform scaling of the data by default (unlike the base heatmap function). Thus your data has bright green bands because those rows have values in the upper range of your distribution and you've set high values to be green, while the black rows have values close to zero. The histogram in the key is telling you the frequency of values overall in the data set. You can see that the distribution has a right shoulder towards green. The trace down each column is telling you the size of the measurements by column position.

Why did you draw this heat map? What did you expect to see? Depending on your needs, you might choose to scale the data (see ?heatmap.2 ), or remove rows with little variation.

ADD COMMENT • link 6.2 years ago by seidel 11k

0

Entering edit mode

@Seidel, thanks . The heatmap is generated by lcpm<-cpm(y) to find DE genes. In color key histogram what does x-scale -20 to 20 means ? To shorlist DE genes, should I limit only those in bright green bands, then how ?

ADD REPLY • link 6.2 years ago by Björn ▴ 110

0

Entering edit mode

There's still something strange about your data, as you describe it:

lcpm<-cpm(y) to find DE genes

This is not the command to find DE genes using edgeR. It is the command to transform feature counts from integers into counts per million.

what does x-scale -20 to 20 means?

the key and scale are telling you the range of values you are plotting. Thus, according to your heat map, you have values ranging from -20 to 20, and you can see from the histogram that you have more positive than negative values.

I'm guessing, that since your heat map scale is -20 to 20, more or less centered at 0, you are actually plotting log ratios, and perhaps don't realize it.

To shorlist DE genes, should I limit only those in bright green bands, then how ?

To shortlist genes, you would not use the heat map, you would use the p-values (hopefully adjusted p-values) from edgeR. However, you can use the heatmap to isolate certain rows if you like. I would suggest you play with some toy examples to get a sense of what's going on.

# create a toy matrix of count data
x <- matrix(round(runif(100,1,200)), nrow=20, ncol=10)
# name the rows like they are genes
rownames(x) <- paste0("g", 1:nrow(x))

# you could plot this just to see what it looks like
library(gplots)
heatmap.2(x, col=redgreen(10), Rowv=FALSE, Colv=FALSE)
# you should see the key telling you the value range
# you should see row labels

# you said you used the cpm function and plotted the results
library(edgeR)
x.cpm <- cpm(x)
heatmap.2(x.cpm, col=redgreen(10))

# most likely you're plotting log ratios
# make some data resembling this
x <- matrix(rnorm(200), nrow=20, ncol=10)
rownames(x) <- paste0("g", 1:nrow(x))
heatmap.2(x, col=redgreen(10))
# once again, notice the color key

# Examine the effect of scaling
# to do this, let's add postive values to some rows
x[1:3,] <- x[1:3,] + 2
heatmap.2(x, col=redgreen(10))

# now we see that those rows are clearly
# different than the rest

# but if we scaled the data plot
heatmap.2(x, scale="row", col=redgreen(10))
# visually the difference goes away, but the
# dendrogram still tells us those rows are different
# Notice also the new values in the legend!!! (Z-scores)

# we can pick out certain rows by saving information
# from drawing the heatmap:
hm <- heatmap.2(x, col=redgreen(10))

# Examine the hm object:
hm

# see that it contains row indicies. we can examine our
# data table re-ordered like the heatmap
x[hm$rowInd,]
# but it's actually drawn upside down, so reverse the index
x[rev(hm$rowInd),]

# This might give you a clue for selecting certain rows
# so select genes with multiple positive values
iv <- apply(x, 1, function(x) sum(x > 1) >= 4)
rownames(x[iv,])

ADD REPLY • link 6.2 years ago by seidel 11k

score 0 · Answer 3 · 2018-02-28

DESeq2 provides read count transformation in order to visualize them as a heatmap.

Look at : - https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization for transformation and https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#heatmap-of-the-count-matrix for heatmap