Biostars beta testing.
Question: Set point colour based on threshold in scatter plot
1
Entering edit mode

Hi,

I have a set of samples and log10 total read counts of them in a data frame like below

``````> head(a)
log10_total_counts
A1            6.468503
A10           6.565213
A11           6.752139
A12           5.078598
A2            6.277342
A3            6.473411
>
``````

For instance red dots have less than 1000000 counts and black dots have more thsn 1000000 counts.

How could I repreduce this plot please?

7
Entering edit mode

Something like the following would work with base R functions:

``````# make some data
a <- data.frame(sample=paste0("A", 1:50), log10_total_counts=rnorm(50, 6))

# get x points
x <- 1:length(a\$log10_total_counts)

# create plot, drop x axis
plot(x, a\$log10_total_counts, ylim=c(0,8), pch=19, xaxt="n", xlab="sample")
# add dashed line and X-axis labels with rotation
abline(h=6, lty="dashed")
axis(1,1:50, labels=a\$sample, las=3)

# select points to color differently
LessThan6 <- a\$log10_total_counts < 6
points(x[LessThan6], a\$log10_total_counts[LessThan6], pch=19, col="red")
``````
6
Entering edit mode

Use `ggplot2`.

Something is missing in your example; the labels at the bottom do not match you data frame, and it's unclear how your sample dataframe would count up to values near 1000000. You might want to precompute/add a QC vector based on that to your dataframe; basically have your dataframe in a state that's ready to be plotted with minimal calculations.

Here's an example; I generate a dataframe with normally distributed values centered on 6.0 (seemed close to what your example had). I plot a dotted line at 6; and anything below 6 becomes red.

For the style, I directly took the function `theme_Publication()` from this link.

Here's the code:

``````library("ggplot2")

# Themes for style
theme_Publication <- function(base_size=14, base_family="helvetica") { ... }
scale_fill_Publication <- function(...){ ... }
scale_colour_Publication <- function(...){ ... }

# Function to plot points as described
function(DF) {
ggplot(DF) +
theme_Publication() + # Style it
geom_point(mapping = aes(x = as.factor(1:nrow(DF)), # Sequentially
y = log10_total_counts, # Y is the count values
color = QC)) + # Color it by the QC vector in the dataframe
scale_color_manual(values = c("PASS" = "black", "FAIL" = "red")) + # Tell it what colours to use.
geom_hline(yintercept = 6.0, linetype="dotted") + # Add the dotted line at 6.0
xlab("Sample") + # Label the X
ylab("log10(Total Reads)") + # Label the Y
ylim(c(-0.2, 8.2)) # Extend the Y axis to be between this range.
}

DF = data.frame(
log10_total_counts = rnorm(30, mean = 6, sd = 0.5), # Randomized data
QC = as.factor(c( rep("FAIL",15), rep("PASS", 15) )) # A vector that is just half FAIL and half PASS
)

DF\$QC <- as.factor(ifelse(DF\$log10_total_counts < 6.0, "FAIL", "PASS")) # Now actually add some logic to what should be PASS or FAIL.
``````
Entering edit mode
1

None of the code uses anything from `gridExtra` - is it being loaded for code within the `_Publication` functions?

RamRS
21k
Entering edit mode
0

You're right, it's not needed here. Edited, and thanks!

• 830
Entering edit mode
0

Note: I cleaned up a discussion that sprang off an unwarranted comment of mine. Apologies!

RamRS
21k
4
Entering edit mode

another plot in ggplot2 with simulated data:

``````df=data.frame(genes=paste0("gene_", seq_along(1:75)), reads=round(rnorm(75, 110,2),0))
library(ggplot2)

geom_point() +
scale_y_continuous(limits = c(0, 200))+
scale_color_manual(name="QC", values = c("red","darkgreen"))+
geom_hline(yintercept = 110, color="red")+
theme_bw()+
xlab("Sample")+
theme(axis.text.x = element_text(angle = 45,hjust = 1),
legend.position = "bottom")
``````

Entering edit mode
1

`BIOSTARS` I am so lucky that I can ask for your help and obtain such a nice solutions `THANK YOU`