Make 2 boxplot from a data frame by plotting values in 1 row with different columns per box plot
4
1
Entering edit mode
8.9 years ago
bgraphit ▴ 20

Hi everyone!

I am trying to find the best way to make 2 boxplot for a specific gene from data found in a row for a subset of columns within data frame x.

x dimensions are 634 by 128 columns

Each row is specific to a gene,

Column 1 has gene name, and I want to say look at gene in row#1

columns 2:48 data I want to include in one boxplot

columns 49:128 data I want to include in another boxplot

data frame looks something like this

      gene       accepted_hits_x1.bam      accepted_hits_x1.bam    etc....
 1      AARS1          -6                            0             etc....

I also want to be able to see each data point that makes up the boxplot plotted in the plot

I am having a problem:

I am running into the problem where my data (residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...

data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))

news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news$data <- (log10(as.numeric(news$data)) + 1)

g <- ggplot(data=news, aes(x=as.factor(factor), y=data))

g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41    A38-5   ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))

The problem is that it keeps giving me error saying that:

Removed 110 rows containing missing values (geom_point)

This could be that these values are negative so taking the log10(value)+1?

boxplot R • 8.8k views
ADD COMMENT
1
Entering edit mode

Are you trying to make boxplot of some specific gene?

ADD REPLY
1
Entering edit mode

Correct but within the data frame I have information for 2 cell types and those are found:

  • columns 2:48 data I want to include in one boxplot
  • columns 49:128 data I want to include in another boxplot

I just edited to clarify

ADD REPLY
0
Entering edit mode

Do you need to do the log transformation? That is what is introducing your NaNs. The boxplot will plot negative numbers if you want to keep them non-transformed.

If you need to do the log transformation, do it like this instead:

news$data <- (log10(abs(as.numeric(news$data)) + 1))
ADD REPLY
0
Entering edit mode

Within my libraries there are some that have 0 counts so when trying to find the residual to mean from those libraries for that particular gene... there are some that end up being negative values.

These are being excluded from the plot when I do the log transformation. Yet following your advise and running

news$data <- (log10(abs(as.numeric(news$data)) + 1))

allows for all values to be plotted.

Yet due to some outliers I am using the log

ADD REPLY
5
Entering edit mode
8.9 years ago
gene_id <- 1 # consider the first gene
data_1 <- your_dataframe[gene_id,2:48]
data_2 <- your_dataframe[gene_id,49:128]
boxplot(data_1,data_2)
ADD COMMENT
1
Entering edit mode

This was before your previous edit, but I'm going to leave it for more examples for others.

I think he wants a specific gene name though, so to add onto your answer:

data_1 <- unlist(your_dataframe[your_dataframe$gene == "gene",2:48])
data_2 <- unlist(your_dataframe[your_dataframe$gene == "gene",49:128])
​boxplot(data_1,data_2)

You could also do it with subset:

data_1 <- unlist(subset(your_dataframe, gene == "geneName", select=2:48))
data_2 <- unlist(subset(your_dataframe, gene == "geneName", select=49:128))
boxplot(data_1,data_2)

Or with factors and ggplot2 if you're feeling fancy:

library(ggplot2)
data <- unlist(subset(your_dataframe, gene == "geneName", select=2:128))
newFrame <- data.frame(data=data, factor=c(rep(1,47), rep(2,80))
qplot(factor(factor), data, data=newFrame, geom="boxplot")
ADD REPLY
0
Entering edit mode

I am running into the problem where my data (residual from mean ... meaning x value - mean) is a series of positive and negative values and it appears that with this plot it is excluding these negative values...

data <- unlist(subset(datavr, gene =="IGF1R", select=2:128))

news <- data.frame(data=data, factor=c(rep(1,47), rep(2,80)))
news$data <- (log10(as.numeric(news$data)) + 1)

g <- ggplot(data=news, aes(x=as.factor(factor), y=data))

g + geom_boxplot() + geom_point(color="purple", size=3) + xlab("A38-41    A38-5   ") + ylab("log10(Residual from Mean)+1") + ggtitle("IGF1R inside region") + theme(plot.title = element_text(face="bold"))

The problem is that it keeps giving me error saying that:

Removed 110 rows containing missing values (geom_point).

This could be that these values are negative so taking the log10(value)+1?

ADD REPLY
5
Entering edit mode
8.9 years ago
Steven Lakin ★ 1.8k

Since you added additional information, I'll just post this as an answer. Your best bet if you want to manipulate details about graphing in R is to use ggplot2 along with factors:

library(ggplot2)   # or install.packages("ggplot2"); library(ggplot2)

data <- unlist(subset(your_dataframe, gene == "geneName", select=2:128))
newData <- data.frame(data=data, factor=c(rep(1,47), rep(2,80))

g <- ggplot(data=newData, aes(x=as.factor(factor), y=data)
g + geom_boxplot() + geom_point(color="dark red", size=3) + xlab("x axis label") + ylab("y axis label") + ggtitle("My Plot Title") + theme(plot.title = element_text(face="bold"))

You can edit virtually everything you see with ggplot2; I only included the basics here. A google search for more will help with that.

ADD COMMENT
0
Entering edit mode

To plot the log10(values)+1, which data frame must be changed?

ADD REPLY
1
Entering edit mode

Do the following to the newData dataframe before plotting:

newData$data <- log10(as.numeric(newData$data)) + 1
ADD REPLY
4
Entering edit mode
8.9 years ago
ethan.kaufman ▴ 380

To make a boxplot with base graphics in R, you need to create a "factor" vector, which indicates which category each of your data points belong to:

f <- factor(c(rep("Group 1", 47), rep("Group 2", 80)))

Then call "boxplot" with the factor and data as arguments:

boxplot(f, as.numeric(dat[1, 2:128]))

Edit: Actually creating the factor is not even necessary in this case. You can just list the two data vectors as multiple arguments:

boxplot(as.numeric(dat[1,2:48]), as.numeric(dat[1,49:128]))
ADD COMMENT
2
Entering edit mode
8.9 years ago
Deepak Tanwar ★ 4.2k
boxplot(data[which(data$gene == "Gene_name"),][2:48], data[which(data$gene == "Gene_name"),][49:ncol(data)], names = c("group1", "group2"))
ADD COMMENT

Login before adding your answer.

Traffic: 2981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6