Entering edit mode

Hello,

we submitted a paper and we t-test and fold change for determining genes differentially expressed between two sample sets (the sets are not equal in numbers, 12 vs 8 in one case, 12 vs 10 in another). A referee is asking us for a qq-plot for the t-tests. I just do not understand what he is intending: the distribution between one set versus other one, or the distribution of genes in all samples versus normal distribution? And what is the simplest way to do it?

Thank you in advance.

Entering edit mode

The reviewer might suspect that the assumptions of the t-test are violated. A quantile-quantile-plot is a good way to compare two distributions, in this case, the theoretical distribution and the empirical distribution. Ideally, the two would be equal, resulting in a straight line. But often, empirical distributions tend to have wider tails, that is, more extreme values than expected are observed, resulting in a skewed Q-Q-plot. You were lucky though because the reviewer might have requested more advanced methods like limma or CyberT, but you might be fine with a t-test because you have a good number of samples.

Now, the question remains which distributions to compare. It could be debated whether the whole expression data should follow a single normal distribution, or if that should only apply to an individual transcript and its measurement error. For a t-test we assume that values for each transcripts are sampled from normal distributions with the same or different means. Because each single t-test 'sees' only the data from a single transcript, the latter should suffice, and one does not need to make the assumption about normality of all gene-expression values or their differences in total.

A t-test is made under the assumption that its T-statistic follows a Student-T distribution under the null-hypothesis. Therefore, instead of making a plot of all the expression data, I would make a Q-Q-plot of the test-statistics against a theoretical student-t distribution with the same degrees of freedom (depending on sample size).

This can be done easily with the functions `qqplot`

and `qt`

in R.

Loading Similar Posts

Did you analyze microarray data with non-standard tools or even homemade statistics instead of something like

`limma`

?It is a commercial software; I would not know if it can be called "non-standard".