boxplot using ggplot2 in R
1
0
Entering edit mode
2.9 years ago
raavi21198 ▴ 20

Hello members!!

I have a raw data which consists of 45 samples and their intensities. This is a microarray data expression. I have comverted this into a dataframe. However, I am confused how to plot a boxplot of all these 45 samples and also group them as "normal" and "tumor". Please help me out with this The code i used is as follows

read_data <- ReadAffy() ##read the raw .CEL files

ph=read_data@phenoData#annotation of the data

ph$sample
ph@data
ph@data[,1]=c("NB","ND","TB","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB")

sampleNames=vector()
logs=vector()
for (i in 1:45) 
{
  sampleNames=c(sampleNames,rep(ph@data[i,1],dim(pmexp)[1]))
  logs=c(logs,log2(pmexp[,i]))
}
logdata <- data.frame(logint=logs,sampleName=sampleNames)

the structure of this dataframe is as follows

    > str(logdata)
'data.frame':   11155455 obs. of  2 variables:
 $ logint    : num  8.79 9.74 11.09 12.38 12.36 ...
 $ sampleName: chr  "NB" "NB" "NB" "NB" ...
> head(logdata)
     logint sampleName
1  8.791163         NB
2  9.736402         NB
3 11.091435         NB
4 12.376125         NB
5 12.363587         NB
6 11.574594         NB
> p <- ggplot(logdata,aes(sampleName,logint))
> p+geom_boxplot()

Can someone please guide me how to create a boxplot using ggplot2 in R, of these 45 samples, by grouping them as normal and tumor samples, as the above code gives me the boxplot of only four samples. I need to print them all together

Thank you

R boxplot ggplot2 • 1.5k views
ADD COMMENT
1
Entering edit mode

in the absence of data, i suggest following:

  1. Convert the data frame from wide format to long format. (dplyr/tidyr)
  2. Attach grouping information for each sample (dplyr)
  3. Draw box plot (ggplot)
  4. Add jitter (ggplot)
  5. Facet by group (ggplot)

Instead of boxplot, consider using violin plot with jitter.

ADD REPLY
0
Entering edit mode

Thank you for your response. I have edited to repost the data. Could you now let me know where am i going wrong

ADD REPLY
3
Entering edit mode
2.9 years ago

here is an example i built from https://bioconductor.org/packages/devel/workflows/vignettes/arrays/inst/doc/arrays.html

library(affy)   # Affymetrix pre-processing
library(limma)  # two-color pre-processing; differential
phenoData <- read.AnnotatedDataFrame(system.file("extdata", "pdata.txt", package="arrays"))
celfiles <- system.file("extdata", package="arrays")
eset <- justRMA(phenoData=phenoData,celfile.path=celfiles)
df=as.data.frame(exprs(eset))
pdata=pData(eset)

library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)


df %>% 
    pivot_longer(everything(),names_to = "cels", values_to ="vals") %>% 
    inner_join(., rownames_to_column(pdata),by = c("cels" = "rowname")) %>% 
    ggplot(., aes(cels,vals, fill=Sensitivity)) +
    geom_boxplot()+
    facet_wrap(~IVT, scales = "free")+
    xlab("")+
    ylab("")+
    theme_bw()+
    theme(axis.text.x = element_text(angle = 90),
          axis.text = element_text(size=18),
          strip.text = element_text(size=18),
          legend.text = element_text(size=18),
          legend.title = element_text(size = 18)
          )

boxplot

ADD COMMENT
1
Entering edit mode

Code suggestions:

a) use theme_set() to both define a theme and set a base size for all relevant parts (axis, theme, labels) in a single command, that saves the multiple arguments in theme().

b) rotate x-axis labels with guides rather than angle as guides ensures proper alignment in horizontal and vertical directions even using angles such as 45°, see here, and

c) put legend on top so its large size does not shrink the plot itself. Again, the sizes of all fonts and labels are auto-adjusted to look decent based on the base_size in the theme_set() command on top.

theme_set(theme_bw(base_size = 15))
df %>% 
  pivot_longer(everything(),names_to = "cels", values_to ="vals") %>% 
  inner_join(., rownames_to_column(pdata),by = c("cels" = "rowname")) %>% 
  ggplot(., aes(cels,vals, fill=Sensitivity)) +
  geom_boxplot()+
  facet_wrap(~IVT, scales = "free")+
  xlab("")+
  ylab("")+
  guides(x = guide_axis(angle = 45))+
  theme(legend.position="top")

By the way, the code example you use requires the arrays package to be installed to have access to their extdata, BiocManager::install("arrays").

enter image description here

ADD REPLY
0
Entering edit mode

Thank you a so much for proving this example

ADD REPLY

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6