How I reproduce such a plot
1
0
Entering edit mode
4.9 years ago
zizigolu ★ 4.3k

Hi,

I have a list of differentially expressed genes (DEGs) from single cell RNA-seq between two clusters of cells. I have also a list of differentially expressed proteins (DEPs) from proteomics . I want to classify DEGs and DEPs, and their overlap into individual functional groups something like below picture but I don't know how. I know how to classify them individually but I need a picture shows all together. For example GO terms for DEGs, GO terms fro DEPs and GO terms for their overlap enter image description here

Any idea?

Pathway r single cell RNA-seq Protomics • 1.9k views
ADD COMMENT
0
Entering edit mode

Divide it into its components:

  1. stacked bar-plot, rotated horizontally
  2. Venn diagram
ADD REPLY
0
Entering edit mode

Thank you; Supposing 100 DEGs , 200 DEPs and 70 overlap, they are being classified into different Terms so how I select which term for plotting?

ADD REPLY
0
Entering edit mode

I'm not sure what the message is behind that plot, what do you want to show?

ADD REPLY
0
Entering edit mode

The relationship between the transcriptome and proteome data

ADD REPLY
4
Entering edit mode
4.9 years ago
AK ★ 2.2k

Hi F,

They can be reproduced using ggplot and VennDiagram::draw.pairwise.venn:

library(tidyverse)
library(VennDiagram)
library(GO.db)

# Grap some example from E. coli
gene2go <- read_tsv("https://www.uniprot.org/uniprot/?query=organism:83333&format=tab&columns=id,go-id")
colnames(gene2go) <- c("Gene", "GO")
DECs <- gene2go[sample(nrow(gene2go), 500),]
DEPs <- gene2go[sample(nrow(gene2go), 500),]

# Calcuate sets
sets <- calculate.overlap(x = list("DECs" = DECs$Gene,
                                   "DEPs" = DEPs$Gene))
Overlap <- sets$a3
DECs_only <- setdiff(sets$a1, Overlap)
DEPs_only <- setdiff(sets$a2, Overlap)
df_sets <- rbind(
  data.frame(Type = rep("Overlap", length(Overlap)), Gene = Overlap),
  data.frame(Type = rep("DECs_only", length(DECs_only)), Gene = DECs_only),
  data.frame(Type = rep("DEPs_only", length(DEPs_only)), Gene = DEPs_only)
)

# Combine with GO data and flatten GO
df_sets_go <- left_join(df_sets, gene2go, by = "Gene") %>% separate_rows(., "GO", sep = "; ")
df_sets_go$Description <- Term(df_sets_go$GO)
levels(df_sets_go$Type) <- as.vector(c("DECs", "DEPs", "Overlap"))

# Only look at top 20 GO terms
GO_top20 <- t(t(sort(table(df_sets_go$GO)))) %>% tail(20) %>% row.names()

# Barplot
ggplot(filter(df_sets_go, GO %in% GO_top20), aes(str_to_sentence(Description))) +
  geom_bar(color = "black", aes(fill = Type)) +
  coord_flip() +
  theme_bw() +
  scale_fill_manual(values = c(
    "DECs" = "black",
    "DEPs" = "white",
    "Overlap" = "grey"
  )) +
  scale_y_continuous(expand = c(0, 0)) +
  xlab("") +
  ylab("Number of DECs or DEPs") +
  theme(legend.position = "top",
        legend.title = element_blank())

# Venn diagram for the whole sets (not only the genes in GO barplot)
draw.pairwise.venn(
  area1 = length(DECs_only),
  area2 = length(DEPs_only),
  cross.area = length(Overlap),
  category = c("DECs", "DEPs")
)

barplot

venn-diagram

Hope it helps.

ADD COMMENT
0
Entering edit mode

Thank you, seems amazing but how I provide gene2go?

ADD REPLY
0
Entering edit mode

Starting from the table which looks like this:

> head(as.data.frame(DECs))
    Gene                                                                                             GO
1 P0A7S9                         GO:0000049; GO:0003735; GO:0005829; GO:0006412; GO:0019843; GO:0022627
2 P0AFW0 GO:0001000; GO:0001073; GO:0001124; GO:0003677; GO:0005829; GO:0008494; GO:0031564; GO:0045727
3 P76000                                                                                     GO:0019867
4 P0A953                                                 GO:0004315; GO:0005829; GO:0006633; GO:0008610
5 Q9JMT8                                                                         GO:0003677; GO:0006355
6 P0AE34                                     GO:0005886; GO:0005887; GO:0022857; GO:0055052; GO:0097638

Please check the codes and see how each dataframe looks like.

ADD REPLY
0
Entering edit mode

Sorry I mean which tool you have used to produce the source of gene2go? Which functional annotation tool?

Also I mam getting this error

> sets <- calculate.overlap(x = list("DECs" = DECs$Gene,
+                                    "DEPs" = DEPs$Gene))
Error in calculate.overlap(x = list(DECs = DECs$Gene, DEPs = DEPs$Gene)) : 
  could not find function "calculate.overlap"

Which package gives this function?

ADD REPLY
0
Entering edit mode

You can use InterProScan.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you so much, I don't have a list of protein sequences rather I have a list of protein IDs that I have converted them to gene symbol. I also have a list of genes from single cell RNA-seq. The goal is to seeing the relationship of proteomics and single cell RNA-seq. For example how much GO terms or pathways are persistent in both data sets.

ADD REPLY

Login before adding your answer.

Traffic: 2641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6