Question

Making a heatmap: sample vs. fold change

2

Entering edit mode

5.7 years ago

Kristin Muench ▴ 620

Hello,

I often see in RNA-Seq papers a figure that plots samples on the Y axis, genes on the X axis, and a fill color representing fold change. Here is an example.

The DESeq package explains how to make a heatmap of normalized expression, but not of fold change. I notice that DESeq sometimes steers its users towards best practices by programming in one approach into their packages while leaving out less desirable approaches (e.g. using dist() vs. cor() in its sample-sample clustering heatmap). Does this imply that there is something less-than-ideal about plotting fold changes for a heatmap to visualize patterns of gene expression across samples?

If it is totally fine to plot fold changes - how does one calculate these FCs, since DEseq itself wouldn't provide it? For example, you could:

For gene X in sample Y, divide that expression value by the mean expression value of Gene X across ALL samples;
For gene X in sample Y, divide that expression value by the mean expression value of Gene X across CONTROL samples only; ...and so forth.

I'm guessing #1 is best practice, but I'm not sure if there is a convention or best practice associated with this kind of visualization.

Thank you, Kristin

RNA-Seq R heatmap DESeq2 • 12k views

ADD COMMENT • link updated 2.6 years ago by margott.j ▴ 10 • written 5.7 years ago by Kristin Muench ▴ 620

0

Entering edit mode

log2 (normalized_counts_treated / normalized_counts_control) is lfc for DESeq2 in my understanding. Normalized across group @ Kristin Muench

ADD REPLY • link 5.7 years ago by cpad0112 21k

0

Entering edit mode

Hi Kristin!

I am aware that this is quite an old post but were you able to find out how to plot the LFC? I have tried to subset the Log2FoldChange column that comes out of the DeSeq2 package and plot that into a heatmap but have been unsuccessful.

Thanks,

Margot

ADD REPLY • link 2.6 years ago by margott.j ▴ 10

score 2 · Answer 1 · 2018-08-16

2

Entering edit mode

5.7 years ago

dr_bantz ▴ 110

Nothing wrong with showing log2-fold change, and often this is the best way to clearly show the results. You could calculate this independently of DESeq2 (and divide the counts in each library by the corresponding number of total mapped reads), or alternatively you can access log2-fold change from the "log2FoldChange" column in the DESeq2 results object:

dds_count_table <- DESeqDataSetFromMatrix(countData = rawcounts,
                                          colData = sample_info,
                                          design = ~condition)
dds <- DESeq(dds_count_table)
res <- results(dds, contrast=c("condition","treatment","control"))
log2_changes <- res[, "log2FoldChange"]

ADD COMMENT • link 5.7 years ago by dr_bantz ▴ 110

0

Entering edit mode

Ah good good! Regarding pulling the L2FCs out of DESeq - correct me if I'm wrong, but doesn't that method would provide the L2FCs across all samples in a condition for a gene, whereas a heatmap of sample-by-gene would require a different L2FC for each sample/gene combination?

I'm guessing that hand-calculating L2FCs would be the way to go. Do you know if we ought to apply kind of shrinking algorithm (a la lfcShrink() ) on L2FCs produced for a heat map? Or is using regular L2FCs fine?

ADD REPLY • link 5.7 years ago by Kristin Muench ▴ 620

0

Entering edit mode

Ah good good! Regarding pulling the L2FCs out of DESeq - correct me if I'm wrong, but doesn't that method would provide the L2FCs across all samples in a condition for a gene, whereas a heatmap of sample-by-gene would require a different L2FC for each sample/gene combination?

I'm guessing you're talking about replicates here. The L2FC pulled from the DESeq output is calculated from the average of all replicates. Showing L2FCs for separate replicates wouldn't make sense - how would you decide which control replicate to pair up with a given treatment replicate?

I'm guessing that hand-calculating L2FCs would be the way to go. Do you know if we ought to apply kind of shrinking algorithm (a la lfcShrink() ) on L2FCs produced for a heat map? Or is using regular L2FCs fine?

Using regular L2FCs is fine, as long as this is clear in the text/figure.

ADD REPLY • link 5.7 years ago by dr_bantz ▴ 110

0

Entering edit mode

From what I understand, shrunken L2FCs are better for visualization and ranking. If you're only showing the heatmap of genes with significant differential expression, though, the shrunken L2FCs will generally be similar. Still, some of the larger "off the chart" L2FCs might come down to a more reasonable range, making your heatmap easier to read. Check out:

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#log-fold-change-shrinkage-for-visualization-and-ranking

ADD REPLY • link 3.8 years ago by MaxF ▴ 120