How to make scatterplot with two different results sets?
1
3
Entering edit mode
5.8 years ago
Biologist ▴ 290

Differential analysis between A vs B gave me differential expressed genes like following:

Genes      logFC    unshrunk.logFC  logCPM  PValue       FDR
Gene1   4.704568888 4.705411203 6.335398698 2.87E-45    5.68E-41
Gene2   5.046769012 5.048410913 5.713396847 6.54E-45    6.46E-41
Gene3   7.582765878 7.824749841 1.200751502 2.03E-43    1.33E-39
Gene4   4.326319082 4.328231859 4.779115389 1.28E-42    5.32E-39
Gene5   4.835955903 4.840334013 4.091127747 1.35E-42    5.32E-39
Gene6   5.312117056 5.314688138 5.331640406 1.98E-42    6.52E-39
Gene7   -5.76775    -5.7776037  4.209391248 5.25E-42    1.48E-38
Gene8   4.562730239 4.564646863 5.011354411 8.05E-42    1.99E-38
Gene9   4.571347719 4.573313339 4.982581445 1.61E-41    3.54E-38
Gene10  4.830679248 4.8340846   4.44951126  2.18E-40    4.31E-37
Gene11  8.263335175 8.279036992 5.679556644 8.20E-40    1.47E-36
Gene12  4.498284621 4.504950052 3.156431424 1.33E-39    2.19E-36
Gene13  10.08604587 10.31318144 3.689108577 1.79E-39    2.71E-36
Gene14  4.802704588 4.805611342 4.648881244 2.07E-39    2.92E-36
Gene15  4.299553259 4.303901316 3.574867745 3.12E-38    4.11E-35
Gene16  -9.0340898  -9.2400894  3.20071166  4.09E-38    5.05E-35

And Differential analysis between C & D gave:

Genes    logFC    unshrunk.logFC    logCPM   PValue      FDR
Gene1   9.830763229 9.857045013 2.461083784 7.33E-15    1.57E-10
Gene18  8.314196291 8.315523973 5.262615028 2.86E-13    3.06E-09
Gene3   11.80447191 11.82477085 4.815575477 1.07E-12    7.60E-09
Gene2   -11.28982093    -11.30537113    8.063029358 3.48E-12    1.86E-08
Gene21  10.45942991 10.46806536 4.673637229 9.87E-12    4.22E-08
Gene22  8.444533948 8.446387311 4.908046624 9.49E-11    3.38E-07
Gene4   -8.36656005 -8.398162862    4.11782659  2.44E-10    6.20E-07
Gene24  -9.953626757    -9.986020817    5.625058944 2.61E-10    6.20E-07
Gene25  9.452210058 9.4636667   3.291139413 2.61E-10    6.20E-07
Gene6   9.609186353 9.615829541 4.213984617 3.39E-10    7.24E-07
Gene9   -8.337494271    -8.349468295    5.483300689 4.36E-10    8.47E-07
Gene28  -13.03662321    -144269487.5    3.249843933 4.97E-10    8.85E-07
Gene29  -10.26198778    -10.33006096    4.8776179   6.02E-10    9.89E-07

I want to make a scatter plot to look at their association. The scatterplot should look something like Figure 2a in this Research paper

RNA-Seq r scatterplot plotting • 2.1k views
ADD COMMENT
0
Entering edit mode

You want to compare A vs B and C vs D using scatter plot?

ADD REPLY
0
Entering edit mode

I want to check their association. All the genes in the tables are both protein-coding and lncRNAs. I want to make a plot something like mentioned in the paper Figure 2a

ADD REPLY
0
Entering edit mode

Not clear, there is no overlap of genes between the two datasets. If that is how it is intended, then what do you expect x-axis and y-axis to be?

ADD REPLY
0
Entering edit mode

I made the changes in the table. There are few overlap genes between both datasets. x-axis (A vs B) y-axis (C vs D)

ADD REPLY
0
Entering edit mode

Merge on gene names then plot, something like:

plot(merge(AB[, 1:2], CD[, 1:2], by = "Genes")[, c(2, 3)])

ADD REPLY
1
Entering edit mode

ok. I did like this

library(ggplot2)
df <- merge(AvsB, CvsD, by = 'Genes')
pdf("fg.pdf")
ggplot(df) +
  geom_point(aes(logFC.x, logFC.y), color = '#0087E9', size = 5) +
  theme_minimal() +
  theme(axis.text = element_text(color = 'black', size = 16),
        axis.line = element_line(color = 'black'))
dev.off()

But this showed only points on the plot not the gene names.

ADD REPLY
0
Entering edit mode

Try:

ggplot(df) +
  geom_point(aes(logFC.x, logFC.y, label = Genes), color = '#0087E9', size = 5) +
  geom_text() +
...
ADD REPLY
0
Entering edit mode

Biologist : Please do not delete posts that have received comments/answers. If you have managed to solve your problem then post your solution as an answer here so it can be useful for someone in future.

ADD REPLY
0
Entering edit mode

Sorry, I wanted to post the question with some update. So deleted it. Anyways I made some changes now. thank

ADD REPLY
1
Entering edit mode
5.8 years ago
zx8754 11k

We can merge datasets on Genes column, then plot usual scatter plot, see example:

# Example data
AB <- read.table(text = "
Genes      logFC    unshrunk.logFC  logCPM  PValue       FDR
Gene1   4.704568888 4.705411203 6.335398698 2.87E-45    5.68E-41
Gene2   5.046769012 5.048410913 5.713396847 6.54E-45    6.46E-41
Gene3   7.582765878 7.824749841 1.200751502 2.03E-43    1.33E-39
Gene4   4.326319082 4.328231859 4.779115389 1.28E-42    5.32E-39
Gene5   4.835955903 4.840334013 4.091127747 1.35E-42    5.32E-39
Gene6   5.312117056 5.314688138 5.331640406 1.98E-42    6.52E-39
Gene7   -5.76775    -5.7776037  4.209391248 5.25E-42    1.48E-38
Gene8   4.562730239 4.564646863 5.011354411 8.05E-42    1.99E-38
Gene9   4.571347719 4.573313339 4.982581445 1.61E-41    3.54E-38
Gene10  4.830679248 4.8340846   4.44951126  2.18E-40    4.31E-37
Gene11  8.263335175 8.279036992 5.679556644 8.20E-40    1.47E-36
Gene12  4.498284621 4.504950052 3.156431424 1.33E-39    2.19E-36
Gene13  10.08604587 10.31318144 3.689108577 1.79E-39    2.71E-36
Gene14  4.802704588 4.805611342 4.648881244 2.07E-39    2.92E-36
Gene15  4.299553259 4.303901316 3.574867745 3.12E-38    4.11E-35
Gene16  -9.0340898  -9.2400894  3.20071166  4.09E-38    5.05E-35", header = TRUE, stringsAsFactors = FALSE)

CD <- read.table(text = "
Genes    logFC    unshrunk.logFC    logCPM   PValue      FDR
Gene1   9.830763229 9.857045013 2.461083784 7.33E-15    1.57E-10
Gene18  8.314196291 8.315523973 5.262615028 2.86E-13    3.06E-09
Gene3   11.80447191 11.82477085 4.815575477 1.07E-12    7.60E-09
Gene2   -11.28982093    -11.30537113    8.063029358 3.48E-12    1.86E-08
Gene21  10.45942991 10.46806536 4.673637229 9.87E-12    4.22E-08
Gene22  8.444533948 8.446387311 4.908046624 9.49E-11    3.38E-07
Gene4   -8.36656005 -8.398162862    4.11782659  2.44E-10    6.20E-07
Gene24  -9.953626757    -9.986020817    5.625058944 2.61E-10    6.20E-07
Gene25  9.452210058 9.4636667   3.291139413 2.61E-10    6.20E-07
Gene6   9.609186353 9.615829541 4.213984617 3.39E-10    7.24E-07
Gene9   -8.337494271    -8.349468295    5.483300689 4.36E-10    8.47E-07
Gene28  -13.03662321    -144269487.5    3.249843933 4.97E-10    8.85E-07
Gene29  -10.26198778    -10.33006096    4.8776179   6.02E-10    9.89E-07", header = TRUE, stringsAsFactors = FALSE)


# merge on Genes column
plotDat <- merge(AB[, 1:2], CD[, 1:2], by = "Genes")

library(ggplot2)

ggplot(plotDat, aes(x = logFC.x, y = logFC.y, label = Genes)) +
  geom_point() +
  geom_text()

Or using ggrepel package, to avoid overlap of gene names:

library(ggrepel)

ggplot(plotDat, aes(x = logFC.x, y = logFC.y, label = Genes)) +
  geom_point() +
  geom_text_repel()
ADD COMMENT
0
Entering edit mode

Thank you. But the gene names are exactly on the points. Is there a way to keep the names a little away far from the points. And how to get the dashed line like the plot given in the research paper? And how to give another color to genes with different fold change cutoff like in the paper?

ADD REPLY
0
Entering edit mode

Please see ggrepel solution, too. That was the reason I gave it as another/better option. Regarding colour, use something like color = as.factor(logFC > 4) inside aes.

ADD REPLY

Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6