Question

Deseq2 pairwise comparision

0

Entering edit mode

6.9 years ago

Bioinfonext ▴ 460

There are two line 216 and 218

Three development stages 5 WEEK (5W), 7W, 9W.

Three tissue: Ca, Co, Pa

each with 2 biological replicate.

With two biological replicate. I want to do differential gene expression analysis using DESeq2 so I tried these codes after reading about DESeq2: ,my aim is to do the pairwise comparison. how to make colData and design formula.

library("DESeq2")

countMatrix = read.table("read_count.22May.2017.new.txt",header=T,sep='\t',check.names=F)

head(countMatrix)

dim(countMatrix)

[1] 57894    35

Now I am not sure how to construct a DESeqDataSet:

dds <- DESeqDataSetFromMatrix(countData = countMatrix,

colData = colData,

design = ~ condition)

RNA-Seq • 4.5k views

ADD COMMENT • link updated 6.9 years ago by dr_bantz ▴ 110 • written 6.9 years ago by Bioinfonext ▴ 460

score 3 · Answer 1 · 2017-06-08

3

Entering edit mode

6.9 years ago

dr_bantz ▴ 110

The 'colData' argument specifies the sample information. This should be a one column dataframe containing the condition for each sample, with the name of the samples as the row names.

colData <- data.frame(condition = conditions)

row.names(colData) <- names

where "conditions" is a vector of containing the condition for each sample and "names" is the name of each sample (in the same order of course!).

ADD COMMENT • link 6.9 years ago by dr_bantz ▴ 110

0

Entering edit mode

I tried to make ColData like this:

ColData <- data.frame (genotypes = c(‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’,), development_stage = c(‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’,) Tissue_type = c(‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’,))

Because I have 2 genotypes, 3 development stage and 3 Tissue but getting some error:

Error: unexpected input in "ColData <- data.frame (genotypes = c(▒"’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’,), development_stage = c(‘5W’, ‘5W’,> 5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’,) Tissue_type = c(‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’,))

ADD REPLY • link 6.9 years ago by Bioinfonext ▴ 460

0

Entering edit mode

I tried to type all condition on linux platform itself but again getting ERROR:

> colData <- data.frame(genotypes = c('216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218'), development_stage = c('5W','5W','5W','5W','5W','5W','7W','7W','7W','7W','9W','9W','9W','9W','9W','9W','5W','5W','5W','5W','5W','7W','7W','7W','7W','7W''7W','9W','9W',9W','9W','9W','9W','9W','9W'),Tissue_type = c('Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa'))

Error

Error: unexpected string constant in "8','218','218','218','218',......

ADD REPLY • link 6.9 years ago by Bioinfonext ▴ 460

1

Entering edit mode

You've missed an apostrophe and a comma in there and the variables in the data frame have different lengths (ie, one of them has the wrong number of samples).

ADD REPLY • link 6.9 years ago by dr_bantz ▴ 110

0

Entering edit mode

Thanks a lot for helping me. I read sampleinfo (colData) as a csv file like this:

SampleInfo<- read.csv("sampleInfo.csv", check.names=F)

I need to ask you one thing about biological replicate information. 216_5W_Ca1 and 216_5W_Ca2 are biological replicate.... How should I add information about these in sampleinfo...

head(SampleInfo)

                   Genotypes Development_stage    Tissue
216_5W_Ca1       216                5W                Ca
216_5W_Ca2       216                5W                Ca
216_5W_Co1       216                5W                Co
216_5W_Co2       216                5W                Co
216_5W_Pa1       216                5W                Pa
216_5W_Pa2       216                5W                Pa

and My counMatrix look like this:

head(countMatrix)

                    216_5W_Ca1   216_5W_Ca2   216_5W_Co1       216_5W_Co2         
1 Rs025080        100              71          0                      0                
2 Rs035250          0              0           0                     50 
3 Rs035280          0              0           0                      0

I also need to understand how to construct desion in DESeqDataSetFromMatrix for pairwise comparison ( 216_5W_Ca_VS_216_5W_Co) or multifactor to extract all differentially expressed genes across all the development and tissue stages above 2 fold and p value <0.001:

ds <- DESeqDataSetFromMatrix(countData = countMatrix,

colData = colData,

design = ~ condition)

ADD REPLY • link 6.9 years ago by Bioinfonext ▴ 460

score 0 · Answer 2 · 2017-06-08

0

Entering edit mode

6.9 years ago

igor 13k

Did you check the DESeq2 vignette? There is a section on paired samples:

Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. ~ subject + condition.

Source: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-use-deseq2-to-analyze-paired-samples

ADD COMMENT • link 6.9 years ago by igor 13k

0

Entering edit mode

Thanks I read DESeq2 Vignette, but I am not able to understand.. what do you mean by pair end samples.....is it about lines like I have 216 and 218?

I am not able to understand multi factor designs. How do I make colData and desing formula in above command?

ADD REPLY • link 6.9 years ago by Bioinfonext ▴ 460

0

Entering edit mode

"paired end" is to do with the technology used for the sequencing itself (I imagine you used single end - either way it's not relevant to your question).

The link igor posted gives some guidelines as to how to deal with having samples encompassing multiple variables (conditions/cell lines). You say you want to do pairwise comparisons between all different variable combinations. For this, you could just do a bunch of different pairwise comparisons separately with DESeq then use multiple testing correction (eg. Bonferroni) to adjust the p-values accordingly. However, this may be hard to interpret, and something like PCA or correlation heatmaps might be more useful.

Edit: Using the DESeq2 contrasts() function would be a good idea for the pairwise comparison.

ADD REPLY • link 6.9 years ago by dr_bantz ▴ 110