Question

Single - cell RNA-seq analysis - how to look at distribution of gene expression in a t-sne plot?

1

Entering edit mode

6.3 years ago

a.rex ▴ 350

I am using the Rtsne package to to perform cell-clustering of single-cell RNA-seq data. I first take my raw counts, normalise them by library size, and identify top 1000 highly variable genes. I save these genes, and perform clustering by t-sne. The resultant plot forms 10 distinct clusters. I have been successful in obtaining this plot - but I need to visualise the distribution of individual gene expression within the clusters obtained by t-sne? Can anyone suggest a website/online tutorial on how to do this?

Essentially I would like to create a figure as a presented in Figure 1b of this paper: https://www.nature.com/articles/nature20105

At the minute I have the following code:

#t-sne of top 1000 variable genes in my dataset
library("Rtsne")
tsne <- Rtsne(t(genes1000)) # genes1000 are the top 1000 genes (TPM) in my dataset 

#color points by group (10 clusters)
plot(tsne$Y, col=c("purple","orange","blue","forestgreen","darkgrey","yellow","red","maroon","skyblue","brown")[branch], bg=c("purple","orange","blue","forestgreen","darkgrey","yellow","red","maroon","skyblue","brown")[branch], pch=21, main="", xlab="t-sne[,1]", ylab="t-sne[,2]")
par(cex=0.8)
legend("bottomleft", legend=c("Group 1","Group 2","Group 3","Group 4","Group 5","Group 6","Group 7","Group 8","Group 9","Group 10"), fill=c("purple","orange","blue","forestgreen","darkgrey","yellow","red","maroon","skyblue","brown"), border=FALSE)

#log transformation of TPM values 
log <- log2(genes1000 +0.001)

RNA-Seq • 4.1k views

ADD COMMENT • link 6.3 years ago by a.rex ▴ 350

2

Entering edit mode

There was a recent question here, which you may find of use: Rtsne plot labelling

Kevin

ADD REPLY • link 6.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you Kevin -

But I still can't label my plot by gene expression :(

ADD REPLY • link 6.3 years ago by a.rex ▴ 350

1

Entering edit mode

I'm current remotely-based with no access rights to journals; however, Google found the figure for me and broke through the access permissions. Can you confirm that it's this figure: https://media.nature.com/full/nature-assets/nature/journal/v539/n7627/images/nature20105-sf5.jpg

Figure 1b is just a violin plot? It looks like they have taken the sample-to-cluster (tSNE cluster) assignment, and then just plotted the normalised expression values. If you want to generate a violin plot, then take a look at A: Hierarchical Clustering in single-channel agilent microarray experiment

You may have to supply your own names to each cluster based on what you believe they represent. The tSNE algorithm will just regard them as cluster 1, cluster 2, etc.

ADD REPLY • link 6.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Hello Kevin - thank you for your help. That is not figure 1b. This is: https://media.nature.com/lw926/nature-assets/nature/journal/v539/n7627/images/nature20105-f1.jpg

ADD REPLY • link 6.3 years ago by a.rex ▴ 350

0

Entering edit mode

I see - thanks. You would have to get the expression values for your gene of interest in each cell, and then colour these expression values with something like:

require("RColorBrewer")
numbers <- c(1:100)
colours <- colorRampPalette(rev(brewer.pal(9, "RdYlBu")))(length(unique(numbers)))[numbers]
plot(numbers, col=colours, pch=20)

upload pic

In this example. 'numbers' would contain your expression values for your gene of interest.

ADD REPLY • link 6.3 years ago by Kevin Blighe 87k

0

Entering edit mode

I have added the code I have so far....any advice would be great, thanks.

ADD REPLY • link 6.3 years ago by a.rex ▴ 350

2

Entering edit mode

Thanks for adding. So, you need to extract the expression values for your gene of interest from genes1000, colour these expression values in a gradient using the code that I posted above, and then supply this colour vector to plot(). The sample ordering in genes1000 would have to match that of tsne$Y, though.