Deseq2 pairwise comparision
2
0
Entering edit mode
6.9 years ago
Bioinfonext ▴ 460

There are two line 216 and 218

Three development stages 5 WEEK (5W), 7W, 9W.

Three tissue: Ca, Co, Pa

each with 2 biological replicate.

With two biological replicate. I want to do differential gene expression analysis using DESeq2 so I tried these codes after reading about DESeq2: ,my aim is to do the pairwise comparison. how to make colData and design formula.

library("DESeq2")

countMatrix = read.table("read_count.22May.2017.new.txt",header=T,sep='\t',check.names=F)

head(countMatrix)

dim(countMatrix)

[1] 57894    35

Now I am not sure how to construct a DESeqDataSet:

dds <- DESeqDataSetFromMatrix(countData = countMatrix,

colData = colData,

design = ~ condition)
RNA-Seq • 4.5k views
ADD COMMENT
3
Entering edit mode
6.9 years ago
dr_bantz ▴ 110

The 'colData' argument specifies the sample information. This should be a one column dataframe containing the condition for each sample, with the name of the samples as the row names.

colData <- data.frame(condition = conditions)

row.names(colData) <- names

where "conditions" is a vector of containing the condition for each sample and "names" is the name of each sample (in the same order of course!).

ADD COMMENT
0
Entering edit mode

I tried to make ColData like this:

ColData <- data.frame (genotypes = c(‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘216’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’,), development_stage = c(‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’,) Tissue_type = c(‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’,))

Because I have 2 genotypes, 3 development stage and 3 Tissue but getting some error:

Error: unexpected input in "ColData <- data.frame (genotypes = c(▒"’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’, ‘218’,), development_stage = c(‘5W’, ‘5W’,> 5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘5W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘7W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’, ‘9W’,) Tissue_type = c(‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’, ‘Ca’, ‘Ca’, ‘Co’, ‘Co’, ‘Pa’, ‘Pa’,))
ADD REPLY
0
Entering edit mode

I tried to type all condition on linux platform itself but again getting ERROR:

> colData <- data.frame(genotypes = c('216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','216','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218','218'), development_stage = c('5W','5W','5W','5W','5W','5W','7W','7W','7W','7W','9W','9W','9W','9W','9W','9W','5W','5W','5W','5W','5W','7W','7W','7W','7W','7W''7W','9W','9W',9W','9W','9W','9W','9W','9W'),Tissue_type = c('Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa','Ca','Ca','Co','Co','Pa','Pa'))

Error

Error: unexpected string constant in "8','218','218','218','218',......
ADD REPLY
1
Entering edit mode

You've missed an apostrophe and a comma in there and the variables in the data frame have different lengths (ie, one of them has the wrong number of samples).

ADD REPLY
0
Entering edit mode

Thanks a lot for helping me. I read sampleinfo (colData) as a csv file like this:

SampleInfo<- read.csv("sampleInfo.csv", check.names=F)

I need to ask you one thing about biological replicate information. 216_5W_Ca1 and 216_5W_Ca2 are biological replicate.... How should I add information about these in sampleinfo...

head(SampleInfo)

                   Genotypes Development_stage    Tissue
216_5W_Ca1       216                5W                Ca
216_5W_Ca2       216                5W                Ca
216_5W_Co1       216                5W                Co
216_5W_Co2       216                5W                Co
216_5W_Pa1       216                5W                Pa
216_5W_Pa2       216                5W                Pa

and My counMatrix look like this:

head(countMatrix)

                    216_5W_Ca1   216_5W_Ca2   216_5W_Co1       216_5W_Co2         
1 Rs025080        100              71          0                      0                
2 Rs035250          0              0           0                     50 
3 Rs035280          0              0           0                      0

I also need to understand how to construct desion in DESeqDataSetFromMatrix for pairwise comparison ( 216_5W_Ca_VS_216_5W_Co) or multifactor to extract all differentially expressed genes across all the development and tissue stages above 2 fold and p value <0.001:

ds <- DESeqDataSetFromMatrix(countData = countMatrix,

colData = colData,

design = ~ condition)
ADD REPLY
0
Entering edit mode
6.9 years ago
igor 13k

Did you check the DESeq2 vignette? There is a section on paired samples:

Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. ~ subject + condition.

Source: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-use-deseq2-to-analyze-paired-samples

ADD COMMENT
0
Entering edit mode

Thanks I read DESeq2 Vignette, but I am not able to understand.. what do you mean by pair end samples.....is it about lines like I have 216 and 218?

I am not able to understand multi factor designs. How do I make colData and desing formula in above command?

ADD REPLY
0
Entering edit mode

"paired end" is to do with the technology used for the sequencing itself (I imagine you used single end - either way it's not relevant to your question).

The link igor posted gives some guidelines as to how to deal with having samples encompassing multiple variables (conditions/cell lines). You say you want to do pairwise comparisons between all different variable combinations. For this, you could just do a bunch of different pairwise comparisons separately with DESeq then use multiple testing correction (eg. Bonferroni) to adjust the p-values accordingly. However, this may be hard to interpret, and something like PCA or correlation heatmaps might be more useful.

Edit: Using the DESeq2 contrasts() function would be a good idea for the pairwise comparison.

ADD REPLY

Login before adding your answer.

Traffic: 2129 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6