EnhancedVolcano and scRNAseq differential gene expression
1
1
Entering edit mode
2.1 years ago
TJ ▴ 50

Hi,

I'm looking to make a volcano plot of differentially expressed genes between two groups of cells from a scRNAseq dataset that was analyzed using Seurat.

Here's code used to generate the DE genes. The seurat object combined is an integrated object with ActiveAssay(combined) = "RNA".

my.deg <- FindMarkers(combined, 
                            ident.1 = c("1", "2"), 
                            ident.2 = c("0", "3", "4"), 
                            verbose = FALSE)

This creates a data.frame with gene names as rows, and includes avg_log2FC, and adjusted p-values. This is done using the Seurat FindMarkers function default parameters, which to my understanding uses a wilcox.test with a Bonferroni correction.

Next, I'm looking to visualize this using a volcano plot using the EnhancedVolcano package:

EnhancedVolcano(my.deg , 
                rownames(my.deg ),
                x ="avg_log2FC", 
                y ="p_val_adj")

I get a plot, but I see a straight line across the top on one side with a bunch of genes on top of each other. I suspect this is from the input data.frame as there are a bunch of 0s in the p_val_adj column. Is there a way to deal with this? I know EnhancedVolcano sets there to 10^-1, but this still shows a bunch of genes on top of each other on a line. Any guidance on how to fix/adjust this would be much appreciated.

enter image description here

seurat wilcox expression gene test EnhancedVolcano differential scRNAseq • 7.3k views
ADD COMMENT
4
Entering edit mode
2.1 years ago

This is part of the statistics... These genes being differentially tested with a Wilcoxon test show an FDR value of 0, which on the -log10 scale is Inf. As you suggest this is likely fixed as a value in EnhancedVolcano, hence why you see the plateau at the top. Ultimately, there is nothing to fix really.

On a more right thing to do note, I'd suggest you explore pseudo bulk-ing this data around your biological replicates as your chance of false positive could be quite high.

ADD COMMENT
2
Entering edit mode

It's probably the machine limit towards small values -log10(.Machine$double.xmin) which would be 307.6527 so that fits the plot.

ADD REPLY
0
Entering edit mode

Thank you both for the input! That all makes sense.

ADD REPLY

Login before adding your answer.

Traffic: 1690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6