Biostar Beta. Not for public use.
Question: DESeq2 modelling and Wald's test
2
Entering edit mode

Hi all,

Can you please explain to me the relation between Wald's test and Negative binomial generalized linear model? As for my understanding, the count data is modeled using negative binomial generalized linear model after which Wald's test is applied to figure out whether a particular gene is significant or not. Please correct me if I'm wrong.

ADD COMMENTlink 22 months ago Uday Rangaswamy • 120 • updated 22 months ago Kevin Blighe 43k
8
Entering edit mode

RNA-seq raw count data 'naturally' follows a negative binomial distribution (Poisson-like), so, the DESeq2 authors model the data as such. By 'model the data', we merely imply that we build a regression model of the data such that we can make statistical inferences from it [the data].

So, after normalising the raw counts, the following occurs:

For each gene, a logistic regression model with the negative binomial as family is fit:

require(MASS)
gene1.model <- glm.nb(gene1 ~ CaseControl + ..., data=MyData)
gene2.model <- glm.nb(gene2 ~ CaseControl + ..., data=MyData)
*et cetera*

Once we have modeled each gene, a simple way to derive a P value for each model coefficient (i.e. CaseControl, etc) is by applying the Wald Test and selecting the coefficient of interest:

require(aod)
wald.test(b=coef(gene1.model), Sigma=vcov(gene1.model), Terms=c(2)) #term '2' would be CaseControl

The Wald test is a standard way to extract a P value from a regression fit.

Kevin

NB - this is not the exact code used by DESeq2, of course. This is just giving you a broad overview with some simple R functions. For one, DESeq2 models dispersion in addition to everything that I have mentioned above, and the Wald test is not used in each case to derive p-values in DESeq2.

ADD COMMENTlink 10 months ago Kevin Blighe 43k
Entering edit mode
1

Thank you so much for that.

ADD REPLYlink 22 months ago
Uday Rangaswamy
• 120
Entering edit mode
0

Kevin, thanks for your explanation. Let me one naive question please? Why we need to make a GLM model before performing a Wald test itself (as i can understand it's just a simple t-test in rough approximation?)? Why not just perform a Wald test on count data?

ADD REPLYlink 8 months ago
Denis
• 70
Entering edit mode
1

A Wald test requires a coefficient and its standard deviation, which are tested for difference from 0. Yes, in a way that's sort of like a single group T-test, but you'd still need to perform a fit first in order to derive the coefficient.

ADD REPLYlink 8 months ago
Devon Ryan
90k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0