Question

DEseq2 Shade PCA plot by gene counts

0

Entering edit mode

4.5 years ago

paulranum11 ▴ 80

Hi All,

I am working with DEseq2 to do some RNA-Seq analysis and would like to find a solution for displaying the expression level of individual genes on the PCA plot. This functionality exists built into other packages like Seurat (using "FeaturePlot"). Does anyone know of a way to implement this with DEseq2's plotPCA.

Here is the simple command i am using to generate the PCA plot.

plotPCA(vsd, intgroup=c("Treatment"))

The following is a link to a DEseq2 walkthrough for reference: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

DEseq2 PCA Heatmap ggplot2 • 3.8k views

ADD COMMENT • link updated 4.5 years ago by Kevin Blighe 88k • written 4.5 years ago by paulranum11 ▴ 80

score 3 · Answer 1 · 2019-11-09

Hey,

The DESeq2 PCA function will [I believe] convert a continuous variable to categorical, which is likely not what you want. This is fine, as DESeq2 is primarily about normalising RNA-seq data and conducting differential expression.

In any case, you can do this with my own package, PCAtools, which should fit nicely with your DESeq2 variance-stabilised expression values and metadata. Here is a reproducible example, taken from my vignette: PCAtools: everything Principal Component Analysis:

1, load breast cancer data from GEO

library(Biobase)
library(GEOquery)

# load series and platform data from GEO
gset <- getGEO('GSE2990', GSEMatrix = TRUE, getGPL = FALSE)
x <- exprs(gset[[1]])

# remove Affymetrix control probes
x <- x[-grep('^AFFX', rownames(x)),]

# extract information of interest from the phenotype data (pdata)
idx <- which(colnames(pData(gset[[1]])) %in%
  c('age:ch1', 'distant rfs:ch1', 'er:ch1',
    'ggi:ch1', 'grade:ch1', 'size:ch1',
    'time rfs:ch1'))

metadata <- data.frame(pData(gset[[1]])[,idx],
  row.names = rownames(pData(gset[[1]])))

# tidy column names
colnames(metadata) <- c('Age', 'Distant.RFS', 'ER', 'GGI', 'Grade',
  'Size', 'Time.RFS')

# prepare certain phenotypes
metadata$Age <- as.numeric(gsub('^KJ', NA, metadata$Age))
metadata$Distant.RFS <- factor(metadata$Distant.RFS, levels=c(0,1))
metadata$ER <- factor(gsub('\\?', NA, metadata$ER), levels=c(0,1))
metadata$ER <- factor(ifelse(metadata$ER == 1, 'ER+', 'ER-'), levels = c('ER-', 'ER+'))
metadata$GGI <- as.numeric(metadata$GGI)
metadata$Grade <- factor(gsub('\\?', NA, metadata$Grade), levels=c(1,2,3))
metadata$Grade <- gsub(1, 'Grade 1', gsub(2, 'Grade 2', gsub(3, 'Grade 3', metadata$Grade)))
metadata$Grade <- factor(metadata$Grade, levels = c('Grade 1', 'Grade 2', 'Grade 3'))
metadata$Size <- as.numeric(metadata$Size)
metadata$Time.RFS <- as.numeric(gsub('^KJX|^KJ', NA, metadata$Time.RFS))

# remove samples from the pdata that have any NA value
discard <- apply(metadata, 1, function(x) anyis.na(x)))
metadata <- metadata[!discard,]

# filter the expression data to match the samples in our pdata
x <- x[,which(colnames(x) %in% rownames(metadata))]

# check that sample names match exactly between pdata and expression data 
all(colnames(x) == rownames(metadata))

2, add ESR1 gene expression to the metadata

metadata$ESR1 <- x["205225_at",]

head(metadata)
         Age Distant.RFS  ER       GGI   Grade Size Time.RFS      ESR1
GSM65752  40           0 ER-  2.480050 Grade 3  1.2     2280  5.893152
GSM65753  46           0 ER+ -0.633592 Grade 1  1.3     2675 11.419733
GSM65755  41           1 ER+  1.043950 Grade 3  3.3      182 10.922990
GSM65757  34           0 ER+  1.059190 Grade 2  1.6     3952  9.629512
GSM65758  46           1 ER+ -1.233060 Grade 2  2.1     1824 11.367321
GSM65760  57           1 ER+  0.679034 Grade 3  2.2      699 10.161146

3, run PCA

library(PCAtools)
p <- pca(x, metadata = metadata, removeVar = 0.1)

Here, your x object would be assay(vsd)

4, generate bi-plot with continuous scale

biplot(p,
    lab = NULL,
    colby = 'ESR1',
    shape = 'ER',
    hline = 0, vline = 0,
    legendPosition = 'right') +

scale_colour_gradient(low = "gold", high = "red2")

Good validation in its own right, as those with higher ESR1 gene expression are also ER-positive by IHC / histology...

Kevin