How to structure my edgeR model to incorporate a temporal variable and 2 categorical variables?
1
0
Entering edit mode
3.0 years ago
O.rka ▴ 710

My experiment is the following:

  • Temperature 26C vs. Temperature 30C
  • Treatment Saline vs. Treatment BMC
  • Timepoints 0, 2, and 3.

My intention is to create GLM model to look at differential abundance between groups:

  1. 26C vs. 30C for Saline treatment
  2. 26C vs. 30C for BMC treatment
  3. Saline vs. BMC at 26C
  4. Saline vs. BMC at 30C

In doing so, I would like to include all the samples possible when calculating the dispersion which is why I'm using a GLM instead of a Fisher's Exact Test for subsets of samples. I would also like to incorporate the ordered time information.

# Functions
read_dataframe = function(path, sep="\t") {
        df = read.table(path, sep=sep, row.names=1, header = TRUE, check.names=FALSE)
        return(df)
}

# Counts
X = read_dataframe("https://pastebin.com/raw/J7kmL8Ly")
# OG0000000 OG0000001   OG0000002   OG0000003   OG0000004
# T2_10_SALINE_TEMP-PE-D710-D505-1_S10  16909   55  5382    5894    1964
# T2_11_BMC_CONTROL-PE-D711-D505-1_S11  24296   60  2772    3962    1374
# T2_12_BMC_CONTROL-PE-D712-D505-1_S12  24619   60  7351    5389    560
# T2_13_BMC_CONTROL-PE-D701-D506-1_S13  22420   15  2172    2778    930
# T2_14_BMC_CONTROL-PE-D702-D506-1_S14  20049   82  4655    6211    553

# Metadata
df_metadata = read_dataframe("https://pastebin.com/raw/PANaC3r5")
#   temperature treatment   collection_time_numeric
# 1_T0_RNA-PE-D711D501-1_S143   26C NaN 0
# 2_T0_RNA-PE-D709D506-1_S134   26C NaN 0
# 4_T0_RNA-PE-D709D505-1_S133   26C NaN 0
# 5_T0_RNA-PE-D709D504-1_S132   26C NaN 

If the T0 timepoint is throwing everything off then it can be removed.

I'm trying to figure out how to do this from the following sources: https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

However, my situation isn't described so please no RTFM responses. If I make a design matrix, it will basically be a binary vector for Treatment_BMC, a binary vector for Treatment_30C, and a numeric vector for the collection time.

If I use this as the design matrix when calculating the dispersion, then how would I for example do #1 above where I calculate the 26C vs. 30C for just the Saline treatment? Does this not make sense to calculate the dispersion for everything?

I could use some guidance a bit here.

edgeR differential-gene-expression glm r • 1.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I tried deleting the other one but couldn't find a delete. Is it possible to delete my other post that you referenced. Apologies.

ADD REPLY
4
Entering edit mode
3.0 years ago
Gordon Smyth ★ 7.0k

Your experiment is actually pretty standard. There are 12 treatment groups (2 temperatures x 2 treatments x 3 times) and you can setup the design matrix as explained in Section 3.3 of the edgeR User's Guide. The design matrix will have 1 column for each group.

However NaN values are not allowed in the metadata. You either have to fill in all the treatment values or else remove those samples.

ADD COMMENT
0
Entering edit mode

When I do this, I end up with the error described here (https://stats.stackexchange.com/questions/521917/understanding-why-design-matrix-not-of-full-rank-the-following-coefficients-no?noredirect=1#comment961380_521917). Dropping the T0 makes sense. However, my design matrix isn't full rank when I have [Treatment_Saline, Treatment_BMC, Temperature_26C, Temperature_30C] because of the Temperature_30C column. If I remove this column, then I am able to run the models. However, if I remove the column I don't know how I can look at the effects of say 30C vs. 26C in the Saline treatment. I'm using edgeR.glmLRT(fit, contrast=limma.makeContrasts(f"{treatment_class} - {reference_class}", levels=pandas_to_rpy2(dm) ) ) to do the contrasts.

ADD REPLY
1
Entering edit mode

No, you haven't followed my advice. If you had, there wouldn't be a Temperature_30C column nor a problem with linear dependence.

You don't have to drop T0 (I wouldn't) but you obviously can't have NaNs.

edgeR is an R package that is part of Bioconductor. You are using a port of edgeR to Python and I cannot help you with Python code. In R, forming the design matrix is no problem and follows a standard process:

> library(edgeR)
Loading required package: limma
> targets <- read.delim("https://pastebin.com/raw/PANaC3r5",row.names=1)
> targets$treatment[1:4] <- "None"
> Group <- paste(targets$treatment,targets$temperature,targets$collection_time,sep=".")
> Group <- factor(Group)
> design <- model.matrix(~0+Group)
> colnames(design) <- levels(Group)
> table(Group)
Group
   BMC.26C.2    BMC.26C.3    BMC.30C.2    BMC.30C.3   None.26C.0 
           5            5            5            5            4 
Saline.26C.2 Saline.26C.3 Saline.30C.2 Saline.30C.3 
           5            6            5            5 
> is.fullrank(design)
[1] TRUE

If you created the design matrix in this standard way then you would have no problems forming the contrasts you wish to make.

> design
   BMC.26C.2 BMC.26C.3 BMC.30C.2 BMC.30C.3 None.26C.0 Saline.26C.2 Saline.26C.3 Saline.30C.2 Saline.30C.3
1          0         0         0         0          1            0            0            0            0
2          0         0         0         0          1            0            0            0            0
3          0         0         0         0          1            0            0            0            0
4          0         0         0         0          1            0            0            0            0
5          0         0         0         1          0            0            0            0            0
6          0         0         0         1          0            0            0            0            0
7          0         0         0         0          0            0            0            0            1
8          0         0         0         1          0            0            0            0            0
9          0         0         0         0          0            0            0            0            1
10         0         0         0         1          0            0            0            0            0
11         0         0         0         0          0            0            0            0            1
12         0         0         0         0          0            0            0            0            1
13         0         0         0         0          0            0            1            0            0
14         0         0         0         0          0            0            1            0            0
15         0         1         0         0          0            0            0            0            0
16         0         0         0         0          0            0            1            0            0
17         0         1         0         0          0            0            0            0            0
18         0         1         0         0          0            0            0            0            0
19         0         0         0         0          0            0            1            0            0
20         0         1         0         0          0            0            0            0            0
21         0         0         0         0          0            0            1            0            0
22         0         0         0         0          0            1            0            0            0
23         0         0         0         0          0            1            0            0            0
24         0         0         0         0          0            1            0            0            0
25         0         0         0         0          0            1            0            0            0
26         0         0         0         0          0            1            0            0            0
27         0         0         0         0          0            0            0            1            0
28         0         0         0         0          0            0            0            1            0
29         0         0         0         0          0            0            0            1            0
30         0         0         0         0          0            0            0            1            0
31         0         0         0         0          0            0            0            1            0
32         1         0         0         0          0            0            0            0            0
33         1         0         0         0          0            0            0            0            0
34         1         0         0         0          0            0            0            0            0
35         1         0         0         0          0            0            0            0            0
36         1         0         0         0          0            0            0            0            0
37         0         0         1         0          0            0            0            0            0
38         0         0         1         0          0            0            0            0            0
39         0         0         1         0          0            0            0            0            0
40         0         0         1         0          0            0            0            0            0
41         0         0         1         0          0            0            0            0            0
42         0         0         0         0          0            0            1            0            0
43         0         0         0         0          0            0            0            0            1
44         0         1         0         0          0            0            0            0            0
45         0         0         0         1          0            0            0            0            0
ADD REPLY
0
Entering edit mode

This makes sense thank you! The design matrix gets expanded out to include the time point info. I'll just make a wrapper to use the R model.matrix function and if that doesn't work I'll just use R entirely.

ADD REPLY

Login before adding your answer.

Traffic: 3170 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6