Question

Control samples size is much lower than tumor samples in TCGA

1

Entering edit mode

6.9 years ago

Poorya Parvizi ▴ 60

I have tried to use TCGA glioblastoma RNA-seq samples to apply differential expression. However i realized that the number of "Solid tissue Normal" is much lower than "Primary tumor" samples. For fpkm files in glioblastoma, there are 6 normal and 161 primary tumor samples.

Is this true? Am i missing something?

TCGA Differential Expression RNA-Seq • 1.7k views

ADD COMMENT • link 6.9 years ago by Poorya Parvizi ▴ 60

0

Entering edit mode

I wouldn't be surprised to have few normal samples since you're dealing with brain tissue here. You don't often remove normal brain tissue whether from the cancer patient or someone else.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

You are right, but 124 of them are dead. So do you think that differential expression in this condition is statistically true?

ADD REPLY • link 6.9 years ago by Poorya Parvizi ▴ 60

0

Entering edit mode

Samples are usually obtained during surgery, trying to keep people alive. That's what I would assume unless there are more details on the samples provenance. Regarding the different sizes of the groups, statistical tests do not assume anything about sample size. In particular, as long as the assumptions of the test hold, the type I error (i.e. calling a difference statistically significant when it is not) is not affected. However, the power of the test (i.e. the probability of rejecting the null hypothesis when it is false) is reduced, this means that the probability of making a type II error (i.e. concluding there is no difference when there is really one) is increased. To put it in less mathematical terms, larger sample sizes make it easier to detect smaller differences. Also keep in mind that statistical significance and biological relevance are not linked a priori.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k