Question

Is it necessary to test for multiple correction during differential expression analysis on microRNA data?

0

Entering edit mode

3.2 years ago

K.patel5 ▴ 140

Hi Biostars,

I have performed differential expression (DE) on microRNA-seq data. Unlike RNA-seq data, this is quite small; I only have 1978 genes across X samples. I believe this is why I am running into issues when interpreting the DE results.

The standard for RNA-seq analysis is to use adjusted P values which have been corrected based on the number of genes found. Usually from an RNA-seq analysis we would expect between 20-40,000 genes. Adjusted P values are valuable here and are now seen as a mandatory check used when discussing results.

Meanwhile in the world on microRNAs, we must do with usually < 2,000 genes. Here are my lowest 3 adjusted P.values from contrasting 30 disease samples / 16 non-disease samples -- so plenty of replicates.

                       lof2fc        P value          adjusted P value
mmu-miR-375-3p         3.589905e-01  6.472503e-06     0.007805839
mmu-miR-200b-3p       -7.077764e-03  9.997980e-04     0.602878205
mmu-miR-429-3p         4.179860e+02  1.513482e-03     0.608419692

As such my question is, for miRNA-seq analysis are adjusted P values unnecessary because we lack the number of genes to perform multiple correction testing adequately?

Appreciate any insight.

microRNA miRNA-seq differential expression stats • 992 views

ADD COMMENT • link updated 3.2 years ago by Carlo Yague 8.6k • written 3.2 years ago by K.patel5 ▴ 140

1

Entering edit mode

well the larger the miRNA list, the more stringent the correction will be. The problem is losing power by testing many miRNAs that are too weakly expressed to yield hits. Try subsetting a few hundred of the most highly expressed miRNAs, and re-run your test.

ADD REPLY • link 3.2 years ago by Jeremy Leipzig 22k

score 2 · Answer 1 · 2021-02-18

2000 tests is still a lot, pvalue correction is absolutely necessary ! With 2000 tests, you can expect to have about 100 false positives at a (uncorrected) pvalue threshold of 5%. In your data, it is likely that you have a uniform pvalue distribution, where most positive test based on the pvalue are in fact false positives. This should explain why only one test remains positive after pvalue correction.

For more insight, see: http://varianceexplained.org/statistics/interpreting-pvalue-histogram/