Question

How I reproduce such a plot

0

Entering edit mode

4.9 years ago

zizigolu ★ 4.3k

Hi,

I have a list of differentially expressed genes (DEGs) from single cell RNA-seq between two clusters of cells. I have also a list of differentially expressed proteins (DEPs) from proteomics . I want to classify DEGs and DEPs, and their overlap into individual functional groups something like below picture but I don't know how. I know how to classify them individually but I need a picture shows all together. For example GO terms for DEGs, GO terms fro DEPs and GO terms for their overlap enter image description here

Any idea?

Pathway r single cell RNA-seq Protomics • 1.9k views

ADD COMMENT • link updated 4.9 years ago by AK ★ 2.2k • written 4.9 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Divide it into its components:

stacked bar-plot, rotated horizontally
Venn diagram

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you; Supposing 100 DEGs , 200 DEPs and 70 overlap, they are being classified into different Terms so how I select which term for plotting?

ADD REPLY • link 4.9 years ago by zizigolu ★ 4.3k

0

Entering edit mode

I'm not sure what the message is behind that plot, what do you want to show?

ADD REPLY • link 4.9 years ago by WouterDeCoster 47k

0

Entering edit mode

The relationship between the transcriptome and proteome data

ADD REPLY • link 4.9 years ago by zizigolu ★ 4.3k

score 4 · Accepted Answer · 2019-05-27

4

Entering edit mode

4.9 years ago

AK ★ 2.2k

Hi F,

They can be reproduced using ggplot and VennDiagram::draw.pairwise.venn:

library(tidyverse)
library(VennDiagram)
library(GO.db)

# Grap some example from E. coli
gene2go <- read_tsv("https://www.uniprot.org/uniprot/?query=organism:83333&format=tab&columns=id,go-id")
colnames(gene2go) <- c("Gene", "GO")
DECs <- gene2go[sample(nrow(gene2go), 500),]
DEPs <- gene2go[sample(nrow(gene2go), 500),]

# Calcuate sets
sets <- calculate.overlap(x = list("DECs" = DECs$Gene,
                                   "DEPs" = DEPs$Gene))
Overlap <- sets$a3
DECs_only <- setdiff(sets$a1, Overlap)
DEPs_only <- setdiff(sets$a2, Overlap)
df_sets <- rbind(
  data.frame(Type = rep("Overlap", length(Overlap)), Gene = Overlap),
  data.frame(Type = rep("DECs_only", length(DECs_only)), Gene = DECs_only),
  data.frame(Type = rep("DEPs_only", length(DEPs_only)), Gene = DEPs_only)
)

# Combine with GO data and flatten GO
df_sets_go <- left_join(df_sets, gene2go, by = "Gene") %>% separate_rows(., "GO", sep = "; ")
df_sets_go$Description <- Term(df_sets_go$GO)
levels(df_sets_go$Type) <- as.vector(c("DECs", "DEPs", "Overlap"))

# Only look at top 20 GO terms
GO_top20 <- t(t(sort(table(df_sets_go$GO)))) %>% tail(20) %>% row.names()

# Barplot
ggplot(filter(df_sets_go, GO %in% GO_top20), aes(str_to_sentence(Description))) +
  geom_bar(color = "black", aes(fill = Type)) +
  coord_flip() +
  theme_bw() +
  scale_fill_manual(values = c(
    "DECs" = "black",
    "DEPs" = "white",
    "Overlap" = "grey"
  )) +
  scale_y_continuous(expand = c(0, 0)) +
  xlab("") +
  ylab("Number of DECs or DEPs") +
  theme(legend.position = "top",
        legend.title = element_blank())

# Venn diagram for the whole sets (not only the genes in GO barplot)
draw.pairwise.venn(
  area1 = length(DECs_only),
  area2 = length(DEPs_only),
  cross.area = length(Overlap),
  category = c("DECs", "DEPs")
)

barplot

venn-diagram

Hope it helps.

ADD COMMENT • link 4.9 years ago by AK ★ 2.2k

0

Entering edit mode

Thank you, seems amazing but how I provide gene2go?

ADD REPLY • link 4.9 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Starting from the table which looks like this:

> head(as.data.frame(DECs))
    Gene                                                                                             GO
1 P0A7S9                         GO:0000049; GO:0003735; GO:0005829; GO:0006412; GO:0019843; GO:0022627
2 P0AFW0 GO:0001000; GO:0001073; GO:0001124; GO:0003677; GO:0005829; GO:0008494; GO:0031564; GO:0045727
3 P76000                                                                                     GO:0019867
4 P0A953                                                 GO:0004315; GO:0005829; GO:0006633; GO:0008610
5 Q9JMT8                                                                         GO:0003677; GO:0006355
6 P0AE34                                     GO:0005886; GO:0005887; GO:0022857; GO:0055052; GO:0097638

Please check the codes and see how each dataframe looks like.

ADD REPLY • link 4.9 years ago by AK ★ 2.2k

0

Entering edit mode

Sorry I mean which tool you have used to produce the source of gene2go? Which functional annotation tool?

Also I mam getting this error

> sets <- calculate.overlap(x = list("DECs" = DECs$Gene,
+                                    "DEPs" = DEPs$Gene))
Error in calculate.overlap(x = list(DECs = DECs$Gene, DEPs = DEPs$Gene)) : 
  could not find function "calculate.overlap"

Which package gives this function?

ADD REPLY • link 4.9 years ago by zizigolu ★ 4.3k

0

Entering edit mode

You can use InterProScan.

ADD REPLY • link 4.9 years ago by AK ★ 2.2k

0

Entering edit mode

Did you install and load VennDiagram? https://www.rdocumentation.org/packages/VennDiagram/versions/1.6.20/topics/calculate.overlap

ADD REPLY • link 4.9 years ago by AK ★ 2.2k

0

Entering edit mode

Thank you so much, I don't have a list of protein sequences rather I have a list of protein IDs that I have converted them to gene symbol. I also have a list of genes from single cell RNA-seq. The goal is to seeing the relationship of proteomics and single cell RNA-seq. For example how much GO terms or pathways are persistent in both data sets.

ADD REPLY • link 4.9 years ago by zizigolu ★ 4.3k