Question

what logFC or adj-p value cutoff should be chosen when the number of samples in one of the conditions is too small?

0

Entering edit mode

5.8 years ago

nazaninhoseinkhan ▴ 520

Dear all,

I run DESeq2 program on 52 tumor Vs only 3 normal samples.

Applying >=0.05 cutoff on adjusted p values and |logFC>=1| was resulted to 1450 up-regulated Vs 440 down-regulated genes.

Now my question is:" is this large numbers of de-regulated genes has been caused by the very small number of normal samples?"

How can I tackle the problem of the small size of normal samples? Is it reasonable to apply more stringent cutoff on logFC or adjusted p-values?

Is it acceptable if I work on these large number of deregulated genes and report them?

I am looking forward to your comments

Nazanin

RNA-Seq DESeq2 small number of normal samples • 6.4k views

ADD COMMENT • link updated 5.8 years ago by Kevin Blighe 87k • written 5.8 years ago by nazaninhoseinkhan ▴ 520

score 2 · Answer 1 · 2018-07-16

2

Entering edit mode

5.8 years ago

Kevin Blighe 87k

For a tumour versus normal comparison, I think that one should expect a large proportion of the transcriptome to be differentially expressed. Your sample numbers are hugely imbalanced, though, which is a limitation of your study.

You can reasonably adjust your thresholds for statistical significance. In fact, I would recommend to use |log2FC|>=2 and adjusted P<=0.01. Basically, you are the analyst here and you should adjust the thresholds to suit the downstream analyses that you (or your collaborators) intend to perform.

It would be interesting to see how you normalised the data and conducted the differential expression analysis. Note that, in the recent version of DESeq2, it is recommended to perform lfcShrink separately and to not use betaPriors:

dds <- DESeq(dds, betaPrior=FALSE)

res <- results(dds, contrast=c("Tissue", "Tumour", "Normal"), independentFiltering=TRUE, alpha=0.01, pAdjustMethod="BH", parallel=TRUE)

res <- lfcShrink(dds, contrast=c("Tissue", "Tumour", "Normal"), res=res)

Kevin

ADD COMMENT • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Yes, I have run lfcShrink function. However, when I checked the results I strangely saw no down-regulated genes were detected.

So I preferred to use the results of dds <- DESeq(dds) instead. As you suggested to me I used |log2FC|>=2 and adjusted P<=0.01, however, no down-regulated genes was detected.

I have another question. I am running DESEq2 on different races. The sample size of tumor Vs normal is very different in distinct races. Should I use the same cutoff (adj p value and logFC) for all races? I want to compare the results between different races.

Thank you so much

Nazanin

ADD REPLY • link 5.8 years ago by nazaninhoseinkhan ▴ 520

0

Entering edit mode

You only have 3 normals, though? How many races are in your dataset?

Note that I published recently on this topic: Racial differences in endometrial cancer molecular portraits in The Cancer Genome Atlas.

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

I am trying to analyze 3 races: Asian(50T, 3N), white(330T,50N) and black\african-american( 28T, 4N).

However, I have analyzed not reported groups, but I am not sure if I include them in the analysis.

And thank u for the paper. I will read it as soon as possible.

ADD REPLY • link 5.8 years ago by nazaninhoseinkhan ▴ 520

1

Entering edit mode

If you want to explore differences between the different races, then you could normalise all samples together and include race + tissue in your design formula.

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

As you suggested to me, at first I wanted to normalize all samples together, however, I had to run the analysis on my laptop I run the analysis separately for each race. I will try to repeat the analysis by normalizing all samples together and compare the results.

ADD REPLY • link 5.8 years ago by nazaninhoseinkhan ▴ 520