Boxplot in ggplot2
3
3
Entering edit mode
6.4 years ago
1769mkc ★ 1.2k

Why is it so difficult to make things in ggplot2 , i like the way it helps in customisation but the curve is steep nevertheless

Here is my sample dataframe

df <-           gene        HSC       CMP
       ENSG00000158292.6  1.8102636  2.456869
       ENSG00000162496.6  2.6796705  6.203838
       ENSG00000117115.10  3.4509115  5.555739
       ENSG00000159423.14  3.6809277  5.063446
       ENSG00000053372.4  5.7089974  6.851090

If i have plot a boxplot i can simply write this boxplot(df[,-1],col=c("red","blue"))

I get a boxplot but when im trying with ggplot2 im having difficult time

ex <- melt(df, id.vars=c("HSC", "CMP"))
ggplot(data = ex,
       aes(x = CMP, y = HSC)) +
  geom_boxplot()

I get a single boxplot what i want is i get a box plot for HSC and CMP which i got when i use simple base R boxplot .

Any help or suggestion would be highly appreciated with my ggplot2 code

R • 9.8k views
ADD COMMENT
0
Entering edit mode

Thank you for such cool neat code ...

ADD REPLY
9
Entering edit mode
6.4 years ago

Devon got there before me but as he mentioned the id.vars needs to be set to 'gene'

Here's a boxplot with scatterplot overlay for anyone else arriving here from Google.

I do agree that ggplot can be difficult to work with. Many functions redundant in the sense that they do the same thing as other but have different names, and conflicts frequently arise. That said, if you can master ggplot, then you can produce very nice graphics for publications.

require(reshape2)
require(ggplot2)

ex <- melt(df, id.vars=c("gene"))
colnames(ex) <- c("gene","group","exprs")

ggplot(data=ex, aes(x=group, y=exprs)) +

    geom_boxplot(position=position_dodge(width=0.5), outlier.shape=17, outlier.colour="red", outlier.size=0.1, aes(fill=group)) +

    #Choose which colours to use; otherwise, ggplot2 choose automatically
    #scale_color_manual(values=c("red3", "white", "blue")) + #for scatter plot dots
    scale_fill_manual(values=c("red", "royalblue")) + #for boxplot

    #Add the scatter points (treats outliers same as 'inliers')
    geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +

    #Set the size of the plotting window
    theme_bw(base_size=24) +

    #Modify various aspects of the plot text and legend
    theme(
        legend.position="none",
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=14, face="bold", vjust=1),

        axis.text.x=element_text(angle=45, size=14, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=14, face="bold", vjust=0.5),
        axis.title=element_text(size=14, face="bold"),

        #Legend
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=12),  #Text size
        title=element_text(size=12)) +      #Title text size

    #Change the size of the icons/symbols in the legend
    guides(colour=guide_legend(override.aes=list(size=2.5))) +

    #Set x- and y-axes labels
    xlab("Stem cell class") +
    ylab("Expression") +

    #ylim(0, 0) +

    ggtitle("My plot")

boxscatter

ADD COMMENT
1
Entering edit mode

That's nice, but a violin plot would be better ;-)

ADD REPLY
0
Entering edit mode

Coincidentally, I just produced a violin plot for other data ;)

ggplot(violinMatrix, aes(x=Sample, y=Expression)) + geom_violin() + theme(axis.text.x = element_text(angle=45, hjust=1))

lol

ADD REPLY
0
Entering edit mode

Thank both of you.i been breaking my head over it ..

ADD REPLY
1
Entering edit mode

Don't worry. I did the same a few years ago trying to work with ggplot.

ADD REPLY
0
Entering edit mode

Im using your code to make boxplots for normalised vs as the data that is not normalised ,what i have to do not to fill those box with data points or dots i tried to remove "aes(fill=group)" still i dont get it i see my hoxplot but it looks filled up with dotpoints..any suggestion ?

ADD REPLY
1
Entering edit mode

Hello my friend. If you do not want the scatterplot overlayed onto the boxplot, just comment out:

geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +
ADD REPLY
0
Entering edit mode

thank you i kind of figured it out after playing it around but anyways thank you for your prompt response

ADD REPLY
0
Entering edit mode

No problem, good luck with it.

ADD REPLY
5
Entering edit mode
6.4 years ago
ex = melt(df, id.vars="gene")
ggplot(ex, aes(x=variable, y=value)) + geom_boxplot()

Your melt() command produced nonsensical output.

ADD COMMENT
1
Entering edit mode

This is good but I would use gather from tidyr. The package tidyr is the evolution of reshape2, and it contains more functions to massage data and reshape it for ggplot2/tidyverse.

ADD REPLY
1
Entering edit mode

Agreed and that's what I teach our students, but I don't want to complicate things when answering a simple "why does X not work" question :)

ADD REPLY
0
Entering edit mode

okay let me do this ...

ADD REPLY
0
Entering edit mode

Thank you very much

ADD REPLY
3
Entering edit mode
6.4 years ago
options(stringsAsFactors = F)
df= read.csv("test.txt", sep="\t")
library(reshape2)
library(ggplot2)
df_melt=melt(df,id.vars="gene")
ggplot(df_melt, aes(variable,value)) +
  stat_boxplot(geom="errorbar", width=.5)+
  geom_boxplot(aes(fill=variable))+
  theme_bw()+
  theme(axis.title.x=element_blank(), axis.title.y=element_blank())+
  stat_summary(fun.y=median, colour="red", geom="line", aes(group = 1))+
  geom_jitter(position = position_jitter(0.2))

Rplot

Input:

> df
                gene      HSC      CMP
1  ENSG00000158292.6 1.810264 2.456869
2  ENSG00000162496.6 2.679670 6.203838
3 ENSG00000117115.10 3.450912 5.555739
4 ENSG00000159423.14 3.680928 5.063446
5  ENSG00000053372.4 5.708997 6.851090
ADD COMMENT
2
Entering edit mode

Nice, but what is the point of connecting the two medians with a red line ? I don't mean to be rude here but unless I'm missing something, that line is just "polluting" the data.

ADD REPLY
1
Entering edit mode

There were several requests in SO to connect group means. In addition, there were requests to view data as well (jitter here). search for "connecting means in ggplot" yields several SO requests. Some of them include for boxplots as well. Lines and colors can be customized, as you are aware.

ADD REPLY
1
Entering edit mode

Yeah I guess this can make sense for time series analysis or things like that...

ADD REPLY
1
Entering edit mode

You are talented cpad

ADD REPLY
2
Entering edit mode

No where near luminaries of biostars here...(including you)

ADD REPLY

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6