How to just add some selected gene names to a volcano plot
2
1
Entering edit mode
5.9 years ago
Farbod ★ 3.4k

Dear Biostars, Hi

I have a volcano plot (obtained from edgeR). I also have some selected annotated genes that I like to highlight them by showing only their name on that plot.

I have used the valuable script/code from Biostars (thank you @WouterDeCoster and @venu and others).

As most of the lines of the first column in my counts.matrix is empty (I have only about 15 names), I received some errors. For example, in the "Head" example below, I have only PAC1 and Gh as Gene name and the other lines have "space" instead of any Gene name!

would you please check my script and my counts.matrix example and help me in this regard? Thanks

NOTE:

1- Head of my counts.matrix file:

Gene    external_gene_name  logFC   logCPM  P.Value FDR
PAC1    TRINITY_c0_g2_i4    -13.8732718193368   4.61855606743036    3.03811154383156e-30    1.87897780840196e-24
    TRINITY_c5_g3_i4    13.2697070220115    4.01618042373378    1.21936893870205e-27    3.7707094407506e-22
GH  TRINITY_c9_g2_i2    11.2269211011449    3.82178000481344    3.20153046019011e-26    6.60015780727773e-21
    TRINITY_c0_g1_i1    12.855465908739 3.60335598332695    2.98018910927419e-24    4.60788644555924e-19
    TRINITY_c0_g1_i1    -11.8727923903381   2.62735890209779    1.81499535997098e-20    2.24503673057178e-15
    TRINITY_c6_g1_i6    10.3243429332541    2.89490936855587    2.16106621247948e-19    2.22758743227662e-14
    TRINITY_g3_i2   -12.0236467730875   2.77706530918009    2.63166646685828e-19    2.32514875441625e-14
    TRINITY_c3_g2_i6    -11.2400737828536   2.00142608642608    1.40238473598696e-18    1.0841643566014e-13     
    TRINITY_c1_g1_i1    -9.45052158373113   2.158090406858  1.66695714263313e-18    1.14551257449686e-13

2- my code:

    # Load packages
    library(dplyr)
    library(ggplot2)
    library(ggrepel)

    # Read data 
    data <- read.table("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE)

    data$threshold = as.factor(data$FDR < 0.001)


    data = mutate(data, sig=ifelse(data$FDR<0.001, "FDR<0.001", "Not Sig"))

    p = ggplot(data, aes(logFC, -log10(P.Value))) +
      geom_point(aes(col=sig)) +
      scale_color_manual(values=c("red", "black"))

    p
  p+geom_text_repel(data=filter(data, FDR<0.001), aes(label=Gene))
R ggplot2 volcano plot • 9.2k views
ADD COMMENT
1
Entering edit mode

Add another filter to filter out genes (column) without name some thing like df$gene!="" in addition to p-value cutoff. This would show only those genes with names and with FDR below the cutoff.

  1. Data has significant FDR values and significant genes are filtered by FDR (from OP code). Hence it is Advisable to plot FC against FDR instead of -log10P (IMO).
  2. From OP' data (OP filtered data by 0.05 FDR, code filters the data by mean (FDR)):
test = read.csv(   "test.txt",   stringsAsFactors = F,   sep = "\t",  strip.white = T ) 
test2 = test[test$FDR < mean (test$FDR) & test$Gene != "",] 
library(dplyr) 
library(ggplot2)

ggplot(test, aes(logFC, FDR, color = FDR < mean(FDR))) +  
geom_point(size = 4) +   
scale_color_manual(values = c("blue", "red", "darkgreen")) +   
geom_text( data = test2,
     aes (logFC, FDR, label = Gene, size = 4, colour = ""),  vjust = -1.2, hjust = "center") +   
theme_bw() +   
theme(legend.position = "none")
  

Rplot01

ADD REPLY
2
Entering edit mode
5.9 years ago
venu 7.1k

Change this line of code

data <- read.table("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE)

to

data <- read.delim("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE, na.strings=c("","NA"))

This adds <NA> to empty cells which will be ignored in the plot and only gene names will be added.

ADD COMMENT
0
Entering edit mode

Dear @venu, Hi. Great answer! Thank you.

ADD REPLY
0
Entering edit mode

Another problem;

I have used "library(ggrepel)" and " p+geom_text_repel(data= . . . " in the code but the name of each gene is just on it's spot without any "arrow", what must I do ? as the names are on each dot and unreadable.

ADD REPLY
1
Entering edit mode
ADD REPLY
3
Entering edit mode
5.6 years ago

You can do this with my new Bioconductor package: EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling

Specifically take a look at Section 4.8 Only label key transcripts of the vignette.

EnhancedVolcano(res2,
        lab = rownames(res2),
        x = "log2FoldChange",
        y = "padj",
        selectLab = c("ENSG00000106565","ENSG00000187758"),
        ... ...
)

v

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6