Hi,

I need some inputs in normalizing the RNA-Seq data with spike-ins and using the DESeq to retrieve differentially expressed genes from the samples. I have a condition where I have 7 samples out of which 4 samples are of peripheries that give tumor and 4 are centers of tumor. I want to normalize the raw fragment counts(which you use in DESeq) with spike-in and then compute the DEGs from it. my samples data set looks like

head(m) Sample_118p.0 Sample_132p2.0 Sample_91p.0 Sample_118rz.0 Sample_132rz1.0 Sample_132rz2.0 Sample_91rz.0 XLOC_000001 1534 2603 1764 1057 2889 3830 1684 XLOC_000002 175 304 208 144 428 367 222 XLOC_000003 80 195 109 916 2515 2314 1082 XLOC_000004 49 66 54 51 127 219 94 XLOC_000005 0 0 0 0 0 0 0 XLOC_000006 0 1 0 0 0 0 0

spike-in data set

head(sp) Sample_118p.0 Sample_132p2.0 Sample_91p.0 Sample_118rz.0 Sample_132rz1.0 Sample_132rz2.0 Sample_91rz.0 ERCC-00009 49 66 54 51 127 219 94 ERCC-00025 9 7 6 5 14 21 8 ERCC-00031 0 0 0 0 1 1 0 ERCC-00034 1 3 2 0 6 6 4 ERCC-00035 5 7 7 9 32 38 21 ERCC-00042 43 78 56 73 202 199 98

I am using the spike ins sub category B which have equal concentrations so that the consistency is maintained

Now I want to use this in DESeq.

So what is the best possible way to implement this normalization on my RNA-Seq data and create the Newcountdata set object and then estimate size factors and then the dispersion (per-gene variance) to get the Differentially expressed genes from there. Does anybody have any idea about this? It will be good if anyone has used such scenarios can give me some idea about this problem?

I'm assuming that you want to use the spike-ins simply for the size normalization, rather than estimating dispersion, correct? If so, you can actually manually set the size factors.

What have you tried so far? Where are you getting stuck in the analysis?