Calculate Pairwise Wilcox.Test For Several Categories And Plot Significance Into A Boxplot With Ggplot2
1
0
Entering edit mode
10.7 years ago
Biojl ★ 1.7k

I have an R dataset that looks pretty much like this one from diamonds:

diamonds2 = subset(diamonds, cut!='Good' & cut!='Very Good', -c(table, x, y, z, clarity, depth, price))

I want to make a boxplot like this one:

ggplot(diamonds2, aes(x=color, y=carat, col=cut))+geom_boxplot()

And the hard question comes here. My idea is to perform pairwise wilcox.test for each distribution of the variable y (carat) by group (cut) and for each of the columns (color).

pairwise.wilcox.test(diamonds2[,'carat'], interaction(diamonds2[,'cut'],diamonds2[,'color']), p.adj = "bonf"

It's not very elegant because is creating a matrix with extra comparisons, but that's the best I got so far. I would like to prune it.

Additionally I would like to plot the results as asterisks of the color between the two distributions I'm comparing. In the first boxplot (D), I would like to plot 3 asterisks, a purple (red and blue are significantly different), a yellow and a cian.

About the asterisk color plotting I've been playing a bit with the function geom_text from ggplot2 but I can't figure out how to plot below the X axis or plot text in different colors.

UPDATE The real data is very similar to the one I posted. The real data are frequencies for all aminoacids in 3 different set of genes. I can plot asterisks/stars with the geom_text in a particular position but can't automatize it to plot significance taking the information from the table I generated and also can't plot in the X axis, above the letter of the aminoacid.

I did the first columns of the significance stars with Gimp, this is how it should look like. test plot

r statistics plot • 10k views
ADD COMMENT
2
Entering edit mode

This question is a little daunting to answer as you have a lot of components to your questions. What have you tried already?

Please consider editing your question above to reflect your bioinformatics data set (and not the diamonds example data) and a graphical display of what you want your figure to look like.

ADD REPLY
1
Entering edit mode

Yes, please indicate your specific bioinformatics research problem. Right now this is a generic R/ggplot2 usage question.

ADD REPLY
0
Entering edit mode

I uploaded a test plot, but for some reason is not appearing.

http://imagizer.imageshack.us/v2/150x100q90/534/u8d5.png

ADD REPLY
0
Entering edit mode
5.6 years ago

Please take a look at my answer here, and follow the comments to get to some code:

d

The Wilcoxon Signed Rank test itself is easy:

wilcox.test(..., paired=TRUE, ...)

Kevin

ADD COMMENT
0
Entering edit mode

I have a doubt i use this ggpubr library for these doing test as well as to plot ,for this im taking rlog values but when i do wlcoxon or KW test the Y axis is not as my expression values what i get is its rank based ,when i just plot normal boxplot i get like range form 0-15 when i use this ggpubr i get range from 0-50 .why is it so?

ADD REPLY
0
Entering edit mode

Hey Krushnach. I am not familiar with ggpubr. Is it doing some transformation / scaling on the data?

ADD REPLY
0
Entering edit mode

The normal boxplot [it seems that the links are broken if i add the image] https://imgur.com/a/FVUQpcu

The boxplot with stats using ggpubr https://imgur.com/a/bNfNCzK

I guess its doing some transformation which im not sure scaling may not be

ADD REPLY
0
Entering edit mode

Oh right, but the data-points do not actually differ (?). It looks like ggpubr has just added a whole lot of extra padding at the top. You can probably change the y-axis limits via ggpubr?

ADD REPLY
0
Entering edit mode

okay let me try i thought it might be the rank or something else .

ADD REPLY

Login before adding your answer.

Traffic: 1601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6