Biostar Beta. Not for public use.
How does deseq2 encode more than 2 levels
0
Entering edit mode
11 months ago
-_- • 830
Canada

When there are two levels per factor, it could be encoded as 0 and 1. What about 3 factors, then? Is it one-hot encoding or something like that when DESeq fit a generalized linear model over the factors? I don't find such information in the paper or user guide yet.

If you could even point me in the source code, that would even better. Thanks.

ADD COMMENTlink
1
Entering edit mode
11 months ago
-_- • 830
Canada

DESeq2 uses model.matrix so you can just plug your design and colData into this base R function to see how it will be encoded.

Quoted from https://support.bioconductor.org/p/77620/#97059

> model.matrix(~participant+sampleType, coldata)
             (Intercept) participantX8326 participantX8329 sampleTypetumor
X8324_normal           1                0                0               0
X8324_tumour           1                0                0               1
X8326_normal           1                1                0               0
X8326_tumour           1                1                0               1
X8329_normal           1                0                1               0
X8329_tumour           1                0                1               1

So it's not really one-hot encoding, but something like it, where it uses [0, 0] to represent participant X8324.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1