Question

Zero read count with deseq2

0

Entering edit mode

6.4 years ago

scamiolo ▴ 10

Dear community,

I am currently working on differential expression in transcripts of vitis vinifera. I aligned several RNAseq experiments on the transcripts sequences of the vitis reference genome using bwa and computed fpkm and effective reads counts with the software express. I ran the deseq pipeline but there is some part that is not clear to me. I am working on two experiments with 2 replicates each. My coldata file (named coldata.txt) is the following:

               condition type
CAN3FI2_exp.txt untreated paired-end
CAN3FI3_exp.txt untreated paired-end
CAN3NI2_exp.txt treated paired-end
CAN3NI3_exp.txt treated paired-end

The first rows of my expression (reads count) input table (named expressionTable.txt) are the following:

CAN3FI2_exp.txt CAN3FI3_exp.txt CAN3NI2_exp.txt CAN3NI3_exp.txt
678 1003 598 488
40 87 25 15
9 12 7 7
914 1028 1063 856
0 0 0 0
48 74 31 25
221 281 181 140
776 1225 664 539
0 0 0 0

finally the first rows of the rownames file (named rownames.txt) are the following:

chr5.gff3_MRNA_VIT_05s0020g02570.t01
chrUn.gff3_MRNA_VIT_00s0144g00290.t01
chr15.gff3_MRNA_VIT_15s0048g00090.t01
chr11.gff3_MRNA_VIT_11s0118g00830.t01
chr5.gff3_MRNA_VIT_05s0165g00290.t01
chr9.gff3_MRNA_VIT_09s0070g00570.t01
chr18.gff3_MRNA_VIT_18s0001g07490.t01
chr12.gff3_MRNA_VIT_12s0035g01940.t01
chr16.gff3_MRNA_VIT_16s0022g02170.t01
chr3.gff3_MRNA_VIT_03s0038g02130.t01

I run the following pipeline to compute the genes that are differentially expressed:

library("DESeq2")
expSet <- read.table("expressionTable.txt",header=T)
rownames <- unlist(read.table("rownames.txt"))
row.names(expSet) <- rownames
coldata <- read.table("coldata.txt",header=T)
cts <- as.matrix(expSet)
dds <- DESeqDataSetFromMatrix(countData=cts,colData=coldata, design=~ condition)
dds <- DESeq(dds)
res <- results(dds)
write.table(res,"expDiff.txt")
ntd <- normTransform(dds)
write.table(assay(ntd),"normalizedExpressions.txt")

If I consider the transcript named chr16.gff3_MRNA_VIT_16s0100g00990.t01 and I look for it in the produced expDiff.txt file I find:

"chr16.gff3_MRNA_VIT_16s0100g00990.t01" 32.3186500941916 1.09546846712526 0.29492459931321 3.71440181550224 0.000203684932436758 0.00480523551483428

As you can see the log2Fold change value is 1.095; this means that the treated sample is more highly expressed than the untreated. If I look for that gene in the rowname.txt file in order to get the line number I obtain the following:

grep -nr "VIT_16s0100g00990" rownames.txt
28170:chr16.gff3_MRNA_VIT_16s0100g00990.t01

Since the expressionTable.txt file has also the field names line, I used the following command to retrieve the reads counts from it corresponding to the transcript chr16.gff3_MRNA_VIT_16s0100g00990.t01

sed -n 28171p expressionTable.txt
19 162 0 0

My question is the following: if the treated sample have zero count in both the replicates, how can it be more highly expressed than the untreated as reported in the expDiff file? I thought it was a matter of normalization and for this reason I used the function normTransform to have the normalized data output in the file normalizedData.txt. However if I look for this transcript in this file I get the following:

grep "chr16.gff3_MRNA_VIT_16s0100g00990.t01" normalizedExpressions.txt
"chr16.gff3_MRNA_VIT_16s0100g00990.t01" 4.29263896969935 6.80318233956259 0 0

So even here the treated have zero count. Am I doing something wrong? Sorry to bother you with a so long e-mail but I preferred to be as detailed as possible in order to give you the possibility to have a precise picture of what I am doing

Thanks a lot for your support

deseq RNAseq differential expression deseq2 • 3.2k views

ADD COMMENT • link updated 6.4 years ago by WouterDeCoster 47k • written 6.4 years ago by scamiolo ▴ 10

0

Entering edit mode

Cross-posting, especially without noting this, is discouraged (duplicates effort of the volunteers answering questions).

https://support.bioconductor.org/p/102962/

ADD REPLY • link 6.4 years ago by Michael Love ★ 2.6k

score 3 · Answer 1 · 2017-11-14

3

Entering edit mode

6.4 years ago

Carlo Yague 8.6k

Could it be that instead of testing the expression of the treated vs untreated, your are testing untreated vs treated? To check for that, try to order the levels properly :

condition=as.factor(rep(c("UNTREATED", "TREATED"), each=2))
condition
[1] UNTREATED UNTREATED TREATED   TREATED  
Levels: TREATED UNTREATED

condition=relevel(condition, "UNTREATED")
condition
[1] UNTREATED UNTREATED TREATED   TREATED  
Levels: UNTREATED TREATED

ADD COMMENT • link 6.4 years ago by Carlo Yague 8.6k

0

Entering edit mode

Thanks Carlo for your answer. I ran the following commands as suggested:

condition=as.factor(rep(c("UNTREATED", "TREATED"), each=2))
condition=relevel(condition, "UNTREATED")

before running the following

dds <- DESeqDataSetFromMatrix(countData=cts,colData=coldata, design=~ condition)
dds <- DESeq(dds)
res <- results(dds)
write.table(res,"expDiff.txt")

This is what I found in the produced expDiff.txt file is still the following:

"chr16.gff3_MRNA_VIT_16s0100g00990.t01" 32.3186500941916 1.09546846712526 0.29492459931321 3.71440181550224 0.000203684932436758 0.00480523551483428

I really can not understand where the problem could be. Thanks a lot for your support!

ADD REPLY • link 6.4 years ago by scamiolo ▴ 10

1

Entering edit mode

Forget about the object "condition", my bad. can you please try again with:

coldata$condition=relevel(condition, "untreated")

(this is because condition is not the same as coldata$condition)

ADD REPLY • link 6.4 years ago by Carlo Yague 8.6k

1

Entering edit mode

Actually using the following lines:

> coldata$condition=as.factor(rep(c("UNTREATED", "TREATED"), each=2))
> coldata$condition=relevel(condition, "untreated")

provided the expected result! thanks a lot for that Carlo!

ADD REPLY • link 6.4 years ago by scamiolo ▴ 10