Question

Differentially Expressed Genes Analysis with RNA-Seq data

1

Entering edit mode

6.9 years ago

Xiaokang ▴ 70

2 questions:

When use RNA-Seq data to do Differentially Expressed Genes (DEG) analysis, should the sample (/replicate) numbers of two groups must be the same? For example, if I have 8 samples from control, and 5 samples from treatment group, is it OK to use DESeq to do DEG analysis?
I'm using HISAT2 and featureCouts, after that, got the counts files, before putting them into DESeq, should I do normalisation firstly or can I use them directly?

RNA-Seq next-gen DEG • 2.5k views

ADD COMMENT • link updated 6.9 years ago by EagleEye 7.5k • written 6.9 years ago by Xiaokang ▴ 70

4

Entering edit mode

6.9 years ago

EagleEye 7.5k

It is always good to maintain same number of samples in both the comparison group. In some cases it is bit complicated to get equal number in the comparison groups. My opinion is, it is completely fine to do differential expression analysis using unequal number of comparison groups.
HISAT2 -> featureCounts, choice of tools seems to be good if you are doing gene level differential expression (DE) analysis. And you do not have perform normalization as most of the DE tools will not work with normalized values.

ADD COMMENT • link 6.9 years ago by EagleEye 7.5k

score 4 · Accepted Answer · 2017-05-10

I am expecting you are using DESeq2 and not DESeq.

As far as numbers in each group is concerned, it is pretty fine to perform DE analysis. The ideal scenario you get equal samples that are paired and you need to use that feature while performing DE analysis with any standard tool like DESeq2, edgeR or Limma.
DESeq2 can still perform DE analysis with just 2 samples in one group and 3 in the other. That's the lowest limit, going lower than that the results are usually not trusted worthy.
I have read edgeR can do even with lesser samples in the group but I do not trust such analysis tbh. Your number of samples per group is pretty good to perform the tests.
About the normalization. The DE tools I mentioned and also you put in query work on count data. So there is no point of putting normalized data in them. They will perform normalization in the subsequent steps. Just prepare your count table well and follow the DESeq2
tutorial and you are good to go.
I will advice to follow the tutorial pretty well before performing any DE analysis. It is always good to understand how the data behaves, not only a QC ploy but also a good practice in exploratory data analysis. Gives an understanding why you need to perform DE analysis and which samples should be included in it. There might be a scenario where you might have to move 1 or 2 samples from either of the group if they behave as outliers, owing to either batch or sequencing errors, even if you take care of them using batch correction methods. So a complete workflow is advised and also enables you to make a pipleline for discovery which you might be using more often in your lab setting. I hope this was informative for your query.