Entering edit mode

I'm doing a differential test for monocle and they show that differentialGeneTest() gives the features that are different between your model but doesn't tell you about which specific genes go up for particular groups. Per there documentation, they state "We could also simply compute summary statistics such as mean or median expression level on a per-CellType basis to see this, which might be handy if we are looking at more than a handful of genes."

This makes sense and I have a calculated normalized expression matrix, my main question is **does one normally use all single cells to calculate the mean expression, including the cells that have no detectable level or just expressed cells?** So for example, a scenario were condition 1 has 400 total cells and 300 cells express geneA and Condition 2 has 200 total cells and only 50 express geneA. If I'm calculating a FC for geneA do I compare

meanexpression(400 TOTAL cells)/meanexpression(200 TOTAL cells) OR

meanexpression(300 EXPRESSING cells)/mean(50 EXPRESSING cells).

I can see how there would be bias in both and so I wonder which is used in the field?

Entering edit mode

It is probably a good idea to do some extra QC filtering (such as for cells with a minimum number of covered genes, and cells with a sufficiently low percentage of mitochondrial reads), but the criteria that can/should be applied will likely vary between projects.

I'm not sure how easy it is to do this with Monocle (or what specific functions to recommend). However, some other potential options would be:

1) Use direct counts for p-values (and use relatively standard RNA-Seq methods like edgeR / limma-voom, or you may be able to try some scRNA-Seq specific methods like MAST), and use CPM values for calculating fold-changes (or some other normalized count, if the goal is to have something to compare to what is provided by the differential expression program)

2) Use Seurat scaled expression for the fold-change calculation, and potentially use standard statistical tests (like `lm()`

for linear-regression, `aov()`

for ANOVA, etc.) to compare differential expression between groups of cells.

Loading Similar Posts

Monocle runs something similar to DEseq but doesn't have a results() function (of DESeq) that calculates that. I already have a normalized counts table, as I said above, and since I'm running through the program, I of course quality filter already. I'm just wondering the question of which cells to use (all or only expressed). Not gonna publish these fold changes just want a value to sort by... the question is

does one normally use all single cells to calculate the mean expression, including the cells that have no detectable level or just expressed cells?