How can I analyze this data in Deseq?
0
0
Entering edit mode
5.1 years ago
MAPK ★ 2.1k

I have my data called count.mat.

count.mat  <-   structure(list(Sample_1 = c(2968, 1701, 272, 249, 186, 1598, 
98, 9, 563, 1003, 278, 242, 137, 625, 614, 494, 681, 475, 3, 
89), Sample_2 = c(2057, 1237, 339, 688, 289, 1354, 265, 17, 402, 
793, 343, 267, 389, 477, 838, 663, 741, 586, 9, 244), Sample_3 = c(5, 
3, 5, 6, 2, 14, 2, 9, 4, 4, 3, 3, 1, 1, 2, 1, 16, 7, 3, 4), Sample_4 = c(5, 
4, 1, 4, 1, 6, 1, 10, 9, 7, 2, 2, 2, 5, 4, 3, 3, 6, 1, 4), Sample_5 = c(8175, 
4828, 2268, 3256, 1712, 4048, 1208, 10, 3166, 4315, 2536, 1610, 
1531, 2794, 3976, 3030, 2427, 4788, 9, 1708), Sample_6 = c(4170, 
2068, 506, 220, 190, 1070, 76, 10, 790, 1635, 264, 146, 67, 558, 
771, 423, 481, 864, 2, 77), Sample_7 = c(15, 6, 4, 3, 2, 4, 2, 
15, 5, 5, 2, 7, 2, 4, 7, 8, 3, 11, 22, 4), Sample_8 = c(15, 7, 
2, 7, 4, 4, 1, 13, 5, 5, 4, 3, 3, 4, 3, 4, 2, 4, 5, 4), Sample_9 = c(132, 
356, 41, 29, 24, 40, 14, 61, 40, 66, 29, 136, 18, 99, 66, 144, 
55, 115, 8, 42), Sample_10 = c(1199, 614, 491, 757, 96, 417, 
170, 47, 486, 452, 437, 189, 563, 481, 857, 1003, 266, 595, 3, 
185), Sample_11 = c(6853, 4667, 4051, 5964, 2940, 6831, 2762, 
107, 9957, 7058, 4048, 4023, 4366, 5235, 5234, 5380, 2904, 6320, 
15, 5502), Sample_12 = c(6332, 4489, 2739, 5371, 2153, 2841, 
1982, 63, 6032, 4417, 2025, 4630, 3264, 3666, 4629, 4601, 2240, 
3408, 15, 4309), Sample_13 = c(16, 14, 10, 1, 2, 22, 1, 2, 5, 
8, 3, 2, 7, 4, 6, 4, 9, 2, 2, 7), Sample_14 = c(12, 6, 5, 4, 
2, 6, 2, 13, 9, 5, 4, 3, 4, 1, 9, 5, 2, 5, 2, 2), Sample_15 = c(6058, 
3363, 1458, 2846, 1315, 3496, 1119, 2, 2028, 4566, 1097, 1135, 
811, 1743, 2310, 2023, 1121, 2420, 10, 2224), Sample_16 = c(23362, 
10663, 2875, 6581, 2264, 4756, 1907, 2, 4405, 4305, 4519, 4197, 
2786, 3201, 9012, 4905, 3179, 3493, 1, 2829), Sample_17 = c(7548, 
2659, 1018, 1363, 376, 7433, 704, 14, 1405, 2772, 1099, 788, 
511, 796, 1721, 1239, 1137, 1519, 6, 606), Sample_18 = c(19, 
6, 9, 5, 3, 18, 2, 9, 7, 12, 2, 5, 3, 5, 15, 7, 2, 8, 16, 3), 
    Sample_19 = c(25, 11, 1, 5, 1, 13, 4, 23, 13, 12, 3, 7, 2, 
    4, 6, 4, 2, 4, 8, 2), Sample_20 = c(7, 3, 2, 4, 1, 3, 2, 
    13, 3, 2, 1, 2, 2, 1, 5, 5, 1, 2, 6, 2)), row.names = c("AAAACAAGTTCTGAATCGTATA", 
"AAAACAAGTTCTGAATCGTATAA", "AAGAAAAGATCATGTCCCAAGA", "AAGAAGAGAATCACAGCCGCCA", 
"AAGAATCCTTCAAAGGGCCTAGA", "AAGCTGAAGTTTTGACATGTCA", "AATCACAGCCGCCAAGTCCTCTA", 
"AATCAGTGAACGAAAGTTAGG", "AATCTAAAGTTTGTCCCATTA", "AATCTTTGACTACATTACGTAAA", 
"AATGTCCAATGGAACGCCGTTA", "ACCATCTCCGTGTCTACAATCA", "ACTCCAGGCGAGCATGCAGTCCA", 
"ACTCTTAGAGATGAATATTCCGA", "AGGGATCCTTATGTCCTAGCCA", "AGTACAAGTCTACGAGCCCATCGA", 
"AGTCCAATGGATCTTTCACTCGA", "AGTCCTTGCCAATGAGGCTTTTA", "AGTGAACGAAAGTTAGGGGAT", 
"AGTGCTAGTTCCCAACGTCGTTGA"), class = "data.frame")

I have my conditions as:

sample.type <-
  as.factor (
    c(
      "Ago2_SsHV2L",
      "Ago2_SsHV2L",
      "Ago2_VF",
      "Ago2_VF",
      "Ago4_SsHV2L",
      "Ago4_SsHV2L",
      "Ago4_VF",
      "Ago4_VF",
      "Dcl1_SsHV2L",
      "Dcl1_SsHV2L",
      "Dcl2_SsHV2L",
      "Dcl2_SsHV2L",
      "SlaGem_2mer",
      "SlaGem_2mer",
      "WTDK3_SsHV2L",
      "WTDK3_SsHV2L",
      "WTDK3_SsHV2L",
      "WTDK3_VF",
      "WTDK3_VF",
      "WTDK3_VF"
    )
  )

infection.status <-
  as.factor(
    c(
      "SsHV2L",
      "SsHV2L",
      "VF",
      "VF",
      "SsHV2L",
      "SsHV2L",
      "VF",
      "VF",
      "SsHV2L",
      "SsHV2L",
      "SsHV2L",
      "SsHV2L",
      "SlaGem_2mer",
      "SlaGem_2mer",
      "SsHV2L",
      "SsHV2L",
      "SsHV2L",
      "VF",
      "VF",
      "VF"
    )
  )

cond <- data.frame(sample.type,infection.status, stringsAsFactors=TRUE)


cond <- as.data.frame(cond)

Deseq Analysis:

library(dplyr)
# Doing for infection.status
dds <- DESeqDataSetFromMatrix(countData=count.mat,colData=cond,design=~infection.status)
dds
dds$infection.status <- relevel(dds$infection.status, ref = "VF")  
dds = DESeq(dds, fitType = "parametric")
my.names<-resultsNames(dds)

Since I wanted to compare the DE between SsHV2L_vs_VF, I have done this:

res<- results(dds, contrast=list(c("infection.status_SsHV2L_vs_VF" )), test="Wald")
res <- as.data.frame(res)
res <- res[with(res, order(padj)), ] %>% select (colnames(res))

However, the result does not look correct. The VF samples have less than 20 counts in all samples and SsHV2L samples have more than 1000 counts. If you look at the padj, the pvalues are significant only for the reads with the fewer counts in both VF and SsHV2L samples. I was wondering why it's not significant for the reads with 1000s of counts in SsHV2L columns and fewer in VF. My data have only 130 rows and I was wondering if that would be the reason causing this problem. Can someone please help me if I am doing it right? Thanks!

deseq rnaseq deseq2 R • 1.2k views
ADD COMMENT
0
Entering edit mode

Which type of data is this?

ADD REPLY
0
Entering edit mode

This is smallRNA seq data.

ADD REPLY
1
Entering edit mode

A quick search reveals a few posts on Bioconductor where the DESeq/DESeq2 developers seem to shy away from recommending DESeq for such data. I saw replies from Michael, Simon, and Wolfgang. Your data does follow a negative binomial, though (I checked via histogram).

These are obviously raw counts that you have? You could try some rudimentary median-based normalisation strategy and then test each gene independently via negative binomial regression using glm.nb(). I presume that you exhaustively searched for other methods.

ADD REPLY
0
Entering edit mode

@Kevin Blighe Thanks Kevin for your answer. Yes, I wanted to look for miRNA-like reads with differential expression between samples. I have seen some papers using DESeq for these reads and wanted to give it a try.

ADD REPLY
0
Entering edit mode

In which journals were they? They may have only used DESeq2 for, for example, size factor calculation (?).

ADD REPLY

Login before adding your answer.

Traffic: 1563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6