Question

QQ-plot for microarray t-test?

0

Entering edit mode

4.9 years ago

sig93618 • 0

Hello,

we submitted a paper and we t-test and fold change for determining genes differentially expressed between two sample sets (the sets are not equal in numbers, 12 vs 8 in one case, 12 vs 10 in another). A referee is asking us for a qq-plot for the t-tests. I just do not understand what he is intending: the distribution between one set versus other one, or the distribution of genes in all samples versus normal distribution? And what is the simplest way to do it?

Thank you in advance.

expression microarray t-test qq-plot • 2.0k views

ADD COMMENT • link 4.9 years ago by sig93618 • 0

0

Entering edit mode

Did you analyze microarray data with non-standard tools or even homemade statistics instead of something like limma?

ADD REPLY • link 4.9 years ago by ATpoint 81k

0

Entering edit mode

It is a commercial software; I would not know if it can be called "non-standard".

ADD REPLY • link 4.9 years ago by sig93618 • 0

score 7 · Answer 1 · 2019-05-18

The reviewer might suspect that the assumptions of the t-test are violated. A quantile-quantile-plot is a good way to compare two distributions, in this case, the theoretical distribution and the empirical distribution. Ideally, the two would be equal, resulting in a straight line. But often, empirical distributions tend to have wider tails, that is, more extreme values than expected are observed, resulting in a skewed Q-Q-plot. You were lucky though because the reviewer might have requested more advanced methods like limma or CyberT, but you might be fine with a t-test because you have a good number of samples.

Now, the question remains which distributions to compare. It could be debated whether the whole expression data should follow a single normal distribution, or if that should only apply to an individual transcript and its measurement error. For a t-test we assume that values for each transcripts are sampled from normal distributions with the same or different means. Because each single t-test 'sees' only the data from a single transcript, the latter should suffice, and one does not need to make the assumption about normality of all gene-expression values or their differences in total.

A t-test is made under the assumption that its T-statistic follows a Student-T distribution under the null-hypothesis. Therefore, instead of making a plot of all the expression data, I would make a Q-Q-plot of the test-statistics against a theoretical student-t distribution with the same degrees of freedom (depending on sample size).

This can be done easily with the functions qqplot and qt in R.

score 0 · Answer 2 · 2019-05-19

0

Entering edit mode

4.9 years ago

sig93618 • 0

Thank you very much for your extensive and very helpful reply. I will follow your instructions. Best

ADD COMMENT • link 4.9 years ago by sig93618 • 0