Biostar Beta. Not for public use.
Heatmap with categorical variables and with phylogenetic tree in R or Python
2
Entering edit mode
2.3 years ago
tlorin • 250
Switzerland

Hi everyone! :)

I have a question and did not find any answer by personal search. I would like to make a heatmap with categorical variables (a bit like this one: heatmap-like plot, but for categorical variables ), and I would like to add on the left side a phylogenetic tree (like this one : how to create a heatmap with a fixed external hierarchical cluster ). The ideal would be to adapt the second one since it looks much prettier! ;)

Here is my data:

• a newick-formatted phylogenetic tree, with 3 species, let's say:
((1,2),3);

• a data frame:
x<-c("species 1","species 2","species 3")
y<-c("A","A","C")
z<-c("A","B","A")
df<- data.frame(x,y,z)


(with A, B and C being the categorical variables, for instance in my case presence/absence/duplicated gene).

Would you know how to do it?

0
Entering edit mode

What about this answer by Obi Griffith I am using this solution whenever I need to plot a heatmap and a tree. Or are you looking for something else?

0
Entering edit mode

Thanks for your answer! Seems really useful indeed. What I do not know is how to choose the color for each category (let's say A=green, B=yellow, C=red) with the heatmap function... But it might easy and I just did not figure it out ^.^

3
Entering edit mode
2.3 years ago
tlorin • 250
Switzerland

I figured out to do it! Here is my script for those that are interested:

#load packages
library("ape")
library(gplots)

#retrieve tree in newick format with 3 species
mytree_brlen <- compute.brlen(mytree, method="Grafen") #so that branches have all same length

#turn the phylo tree to a dendrogram object
hc <- as.hclust(mytree_brlen) #Compulsory step as as.dendrogram doesn't have a method for phylo objects.
dend <- as.dendrogram(hc)
plot(dend, horiz=TRUE) #check dendrogram face

#create a matrix with values of each category for each species
a<-mytree_brlen\$tip
b<-c("gene1","gene2")
list<-list(a,b)
values<-c(1,2,1,1,3,2)  #some values for the categories (1=A, 2=B, 3=C)
mat <- matrix(values,nrow=3, dimnames=list) #Some random data to plot

#plot the heatmap
heatmap.2(mat, Rowv=dend, Colv=NA, dendrogram='row',col =
colorRampPalette(c("red","green","yellow"))(3),
sepwidth=c(0.01,0.02),sepcolor="black",colsep=1:ncol(mat),rowsep=1:nrow(mat),
key=FALSE,trace="none",
cexRow=2,cexCol=2,srtCol=45,
margins=c(10,10),
main="Gene presence, absence and duplication in three species")

#legend of heatmap
par(lend=2)           # square line ends for the color legend
legend("topright",      # location of the legend on the heatmap plot
legend = c("gene absence", "1 copy of the gene", "2 copies"), # category labels
col = c("red", "green", "yellow"),  # color key
lty= 1,             # line style
lwd = 15            # line width
)


And I don't know how to show the result but it does work ;)