DEseq2 Shade PCA plot by gene counts
1
0
Entering edit mode
4.5 years ago
paulranum11 ▴ 80

Hi All,

I am working with DEseq2 to do some RNA-Seq analysis and would like to find a solution for displaying the expression level of individual genes on the PCA plot. This functionality exists built into other packages like Seurat (using "FeaturePlot"). Does anyone know of a way to implement this with DEseq2's plotPCA.

Here is the simple command i am using to generate the PCA plot.

plotPCA(vsd, intgroup=c("Treatment"))

The following is a link to a DEseq2 walkthrough for reference: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

DEseq2 PCA Heatmap ggplot2 • 3.8k views
ADD COMMENT
3
Entering edit mode
4.5 years ago

Hey,

The DESeq2 PCA function will [I believe] convert a continuous variable to categorical, which is likely not what you want. This is fine, as DESeq2 is primarily about normalising RNA-seq data and conducting differential expression.

In any case, you can do this with my own package, PCAtools, which should fit nicely with your DESeq2 variance-stabilised expression values and metadata. Here is a reproducible example, taken from my vignette: PCAtools: everything Principal Component Analysis:

1, load breast cancer data from GEO

library(Biobase)
library(GEOquery)

# load series and platform data from GEO
gset <- getGEO('GSE2990', GSEMatrix = TRUE, getGPL = FALSE)
x <- exprs(gset[[1]])

# remove Affymetrix control probes
x <- x[-grep('^AFFX', rownames(x)),]

# extract information of interest from the phenotype data (pdata)
idx <- which(colnames(pData(gset[[1]])) %in%
  c('age:ch1', 'distant rfs:ch1', 'er:ch1',
    'ggi:ch1', 'grade:ch1', 'size:ch1',
    'time rfs:ch1'))

metadata <- data.frame(pData(gset[[1]])[,idx],
  row.names = rownames(pData(gset[[1]])))

# tidy column names
colnames(metadata) <- c('Age', 'Distant.RFS', 'ER', 'GGI', 'Grade',
  'Size', 'Time.RFS')

# prepare certain phenotypes
metadata$Age <- as.numeric(gsub('^KJ', NA, metadata$Age))
metadata$Distant.RFS <- factor(metadata$Distant.RFS, levels=c(0,1))
metadata$ER <- factor(gsub('\\?', NA, metadata$ER), levels=c(0,1))
metadata$ER <- factor(ifelse(metadata$ER == 1, 'ER+', 'ER-'), levels = c('ER-', 'ER+'))
metadata$GGI <- as.numeric(metadata$GGI)
metadata$Grade <- factor(gsub('\\?', NA, metadata$Grade), levels=c(1,2,3))
metadata$Grade <- gsub(1, 'Grade 1', gsub(2, 'Grade 2', gsub(3, 'Grade 3', metadata$Grade)))
metadata$Grade <- factor(metadata$Grade, levels = c('Grade 1', 'Grade 2', 'Grade 3'))
metadata$Size <- as.numeric(metadata$Size)
metadata$Time.RFS <- as.numeric(gsub('^KJX|^KJ', NA, metadata$Time.RFS))

# remove samples from the pdata that have any NA value
discard <- apply(metadata, 1, function(x) anyis.na(x)))
metadata <- metadata[!discard,]

# filter the expression data to match the samples in our pdata
x <- x[,which(colnames(x) %in% rownames(metadata))]

# check that sample names match exactly between pdata and expression data 
all(colnames(x) == rownames(metadata))

2, add ESR1 gene expression to the metadata

metadata$ESR1 <- x["205225_at",]

head(metadata)
         Age Distant.RFS  ER       GGI   Grade Size Time.RFS      ESR1
GSM65752  40           0 ER-  2.480050 Grade 3  1.2     2280  5.893152
GSM65753  46           0 ER+ -0.633592 Grade 1  1.3     2675 11.419733
GSM65755  41           1 ER+  1.043950 Grade 3  3.3      182 10.922990
GSM65757  34           0 ER+  1.059190 Grade 2  1.6     3952  9.629512
GSM65758  46           1 ER+ -1.233060 Grade 2  2.1     1824 11.367321
GSM65760  57           1 ER+  0.679034 Grade 3  2.2      699 10.161146

3, run PCA

library(PCAtools)
p <- pca(x, metadata = metadata, removeVar = 0.1)

Here, your x object would be assay(vsd)

4, generate bi-plot with continuous scale

biplot(p,
    lab = NULL,
    colby = 'ESR1',
    shape = 'ER',
    hline = 0, vline = 0,
    legendPosition = 'right') +

scale_colour_gradient(low = "gold", high = "red2")

aaa

Good validation in its own right, as those with higher ESR1 gene expression are also ER-positive by IHC / histology...

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin, in your github vignette for PCAtools you also show the pairsplot function to look at multiple PC's instead of only PC1 and 2 that you show above. And also there you can plot a continuous variable like you did here, via colby =. However, it seems to me the + scale_colour_gradient(low = "gold", high = "red2") doesn't work for pairsplot. Is this correct? And was this meant to be like this?

ADD REPLY
0
Entering edit mode

That may very well be the case, as the pairs plot is ultimately many bi-plots pieced together, but the internal coding is therefore different. The pairs plot is definitely the 'weak point' of the package, and I believe ggplot2 now has specific functionality for generating such plots.

ADD REPLY

Login before adding your answer.

Traffic: 1003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6