Check if RNASeq count data follow NB or Poisson distributions
1
0
Entering edit mode
7.9 years ago
debitboro ▴ 260

Hi Biostars,

I have used HTSeq to generate the following table for counting reads per gene per sample (I have 12 biological replicates):

ENSG00000000003  0  0  5  7   0  0  0  0   0   0  12   0
ENSG00000000005  0  0  3  2   0  0  0  0   0   2   4   0
ENSG00000000419  2  2  3  5  18 20  0  2   2   3  13  32
ENSG00000000457 15  6 11  7 129 21  8 90  41  97 129 104
ENSG00000000460  6  2  9  5  62 12  3 30  21  61  78  62
ENSG00000000938  0  0  5  0  16  3  0 16   7  25  32   5
...
...

My data are paired-end RNASeq data. Now I want to check if my count data follow NB or Poisson distributions. What is the recommended way to perform this ?

I appreciate you help.

RNA-Seq Negative Binomial Poisson distribution • 1.4k views
ADD COMMENT
1
Entering edit mode
7.9 years ago

Plot the variance as a function of mean (use normalized counts). If there's a linear relationship (there won't be unless you're working on a cell line or something simple like that) then it's Poisson.

ADD COMMENT
0
Entering edit mode

To see if gene counts from technical replicates are well approximated by Poisson, I've tried looking at the SEQC technical replicates, using the Bioconductor seqc package. Poisson was a good fit for most genes. It's a bit tricky because of differences in library size across samples, so I used the expected value for a gene x sample as the rate of the Poisson and looked at the distribution of cdf(count). This was nearly uniform.

ADD REPLY

Login before adding your answer.

Traffic: 3212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6