Question

Clustering use GO analysis (DAVID) Heatmap between different groups

0

Entering edit mode

4.9 years ago

khoang3 • 0

Hello,

So I have a excel/txt file that has the genes listed in column1, i have the fold change of these genes in one group in column 2, and another fold change in column 3. I performed DAVID analysis to get the GO terms.

What I am trying to do is to see if I can group the genes into the different GO terms with their associated fold change between column 2 and 3.

Ive been looking online and haven't really been able to find a solution that I can really comprehend.

I tried use Galaxy to input the list myself, but was wondering if there is a way to group them into GO terms before i generate the heatmap.

Sorry I am new to this and thanks for any suggestions

RNA-Seq Heatmap DAVID gene • 2.3k views

ADD COMMENT • link updated 4.9 years ago by Kevin Blighe 87k • written 4.9 years ago by khoang3 • 0

Kevin Blighe · Answer 1 · 2019-06-03

2

Entering edit mode

4.9 years ago

Kevin Blighe 87k

Perhaps this will help, if you can follow my code: Clustering of DAVID gene enrichment results from gene expression studies

Kevin

ADD COMMENT • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Hello Kevin,

thanks for the reply,

I was wondering how I would go about changing this part of your code:

#Create heatmap annotations
dfMinusLog10FDRGenes <- data.frame(-log10(topTable[which(topTable[,1] %in% rownames(annGSEA)),"padj"]))
toptdfMinusLog10FDRGenes[dfMinusLog10FDRGenes=="Inf"] <- 0
dfFoldChangeGenes <- data.frame(topTable[which(topTable[,1] %in% rownames(annGSEA)),"log2FoldChange"])
dfGeneAnno <- data.frame(dfMinusLog10FDRGenes, dfFoldChangeGenes)
colnames(dfGeneAnno) <- c("DEG\nsignificance\nscore", "Regulation")
dfGeneAnno[,2] <- ifelse(dfGeneAnno[,2]>0, "Up-regulated", "Down-regulated")
colours <- list("Regulation"=c("Up-regulated"="royalblue", "Down-regulated"="yellow"))
haGenes <- rowAnnotation(df=dfGeneAnno, col=colours, width=unit(1,"cm"))

dfMinusLog10BenjaminiTerms <- data.frame(-log10(read.table(DAVIDfile, sep="\t", header=TRUE)[which(read.table(DAVIDfile, sep="\t", header=TRUE)$Term %in% colnames(annGSEA)),"Benjamini"]))
colnames(dfMinusLog10BenjaminiTerms) <- "GO Term\nsignificance\nscore"
haTerms <- HeatmapAnnotation(df=dfMinusLog10BenjaminiTerms,
                             colname=anno_text(colnames(annGSEA), rot=40, just="right", offset=unit(1,"npc")-unit(2,"mm"), gp=gpar(fontsize=termLab)),
                             annotation_height=unit.c(unit(1, "cm"), unit(8, "cm")))

pdf("GO.pdf", width=7, height=12)
hmapGSEA <- Heatmap(annGSEA,

                name="My enrichment",

                split=dfGeneAnno[,2],

                col=c("0"="white", "1"="forestgreen"),

                rect_gp=gpar(col="grey85"),

                cluster_rows=T,
                show_row_dend=T,
                row_title="Statistically-significant genes",
                row_title_side="left",
                row_title_gp=gpar(fontsize=12, fontface="bold"),
                row_title_rot=0,
                show_row_names=TRUE,
                row_names_gp=gpar(fontsize=geneLab, fontface="bold"),
                row_names_side="left",
                row_names_max_width=unit(15, "cm"),
                row_dend_width=unit(10,"mm"),

                cluster_columns=T,
                show_column_dend=T,
                column_title="Enriched terms",
                column_title_side="top",
                column_title_gp=gpar(fontsize=12, fontface="bold"),
                column_title_rot=0,
                show_column_names=FALSE,
                #column_names_gp=gpar(fontsize=termLab, fontface="bold"),
                #column_names_max_height=unit(15, "cm"),

                show_heatmap_legend=FALSE,

                #width=unit(12.5, "cm"),

                clustering_distance_columns="euclidean",
                clustering_method_columns="ward.D2",
                clustering_distance_rows="euclidean",
                clustering_method_rows="ward.D2",

                bottom_annotation=haTerms)

draw(hmapGSEA + haGenes, heatmap_legend_side="right", annotation_legend_side="right") dev.off()

to just generating a heat map based off of fold change between two different experimental conditions. I've just started to learn R. My toptable has 3 columns (column 1 is the list of genes, column 2 is fold change in one condition, and column 3 is fold change in an another experimental condition). I have about 51 go terms with a lot of overlapping genes that I want to visualize on a heatmap between these two conditions

Thanks for the help and any suggestions!

ADD REPLY • link updated 4.9 years ago by Kevin Blighe 87k • written 4.9 years ago by khoang3 • 0

1

Entering edit mode

I see, this function is not really for that type of data - it is more for data coming straight from the DAVID website. Essentially, the input to the Heatmap() function is a matrix of 1 and 0 (1 = gene present in GO term; 0 = gene not present in GO term)

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k