p-value calculation when comparing ratios of means (not means of ratios)
3
1
Entering edit mode
5.9 years ago
wildtype ▴ 50

Hello, I apologize in advance for the dumb question but I am no expert in statistics, my course only covered how to do student's t.test pretty much. I could not find information in different forums on a similar problem specifically. I am not sure what to do in the following situation. Suppose we measure the expression of geneX by RT-qPCR in two different cell lines, each of which is treated with some drug or not treated. They are independent, so it is not the same cells before an after treatment; it is different cells that are either treated or not. Suppose we have 3 biological replicates for each condition, and the following values

Cell line A, no treatment: c(0.9,1,1.1) => mean expression 1 +/- 0.1 standard deviation (let's call this A1) Cell line A, treatment: c(9,10,11) => mean expression 10 +/- 1 sd (A2) Cell line B: no treatment: c(2.9,3,3.1) => mean expression 3 +-/ 0.1 sd (B1) Cell line B: treatment: c(290,300,310) => mean expression 300 +-/ 10 sd (B2)

Obviously, the drug causes an increase in both cell lines, but in line A there is a 10-fold increase (A2/A1=10/1) while in line B there is a 100 fold increase (B2/B1=300/3). I want to claim that the effect in B is much stronger.

What statistical test should we use to tell whether the RATIOS (gene expression upon treatment normalized to basal gene expression without treatment) are significantly different i.e. A2/A1 +/- propagated error != B2/B1 +/- error

Thanks in advance

statistics • 2.6k views
ADD COMMENT
1
Entering edit mode
5.8 years ago
pbpanigrahi ▴ 420

Hi

Since you are interested in asking the question I want to claim that the effect in B is much stronger, your

  • Null hypothesis can be: Fold change in cell line A is <= B
  • Alternate hypothesis can be: Fold change in cell line A is > B

Now you will do the testing. Since there are 3 replicates, n=3, you will have to go for t-test with 2 degree of freedom (mark that you have less power). Also since before and after are not paired, you should go for unpaired t test.

Before doing you have to calculate fold change e.g. A1 with A2 and B1 with B2, so you will have 3 fold change values

a1 = c(0.9,1,1.1)
a2 = c(9,10,11)
b1 = c(2.9,3,3.1)
b2 = c(290,300,310)

a_diff = a2/a1
b_diff = b2/b1

t.test(a_diff, b_diff, alternative = "greater")

In the example above, there is 10 fold change in every comparison, so t test would give error. Why it will give error, refer this link

Hope this helps

ADD COMMENT
0
Entering edit mode

The problems with this approach is that you are pairing a1 to a2 and b1 to b2, when OP specifically states that before and after treatment are independent samples. Equally valid would be a_diff <- rev(a1)/a2 and all other permutations

ADD REPLY
1
Entering edit mode
5.8 years ago
ejm32 ▴ 450

In this case I think the best thing to is make use of linear models, which will allow you to perform contrasts based on the comparisons you care about. Have a look at the limma manual for a bit more information.

library(limma)
a1 <-  c(0.9,1,1.1)
a2 <-  c(9,10,11)
b1 <-  c(2.9,3,3.1)
b2 <-  c(290,300,310)

m <- data.frame(a1,a2,b1,b2, stringsAsFactors = F)
m <- reshape2::melt(m)

mm <- model.matrix( ~ 0 + m$variable)
colnames(mm) <- levels(m$variable)
res <- lmFit(t(log2(m[2])), mm) #If your values are already log transformed remove the log2 call

cm <- makeContrasts(A=a2-a1,
                    B=b2-b1,
                    Diff = (b2-b1)-(a2-a1), levels = levels(m$variable))

res2 <- contrasts.fit(res, cm)
res3 <- eBayes(res2)
topTable(res3, coef = 3 ,number = 1)

#              logFC  AveExpr           t      P.Value    adj.P.Val         B
# value   3.32192809 3.281243 27.82837605 6.974347e-10 2.789739e-09 13.450186
ADD COMMENT
0
Entering edit mode
5.9 years ago
Chirag Parsania ★ 2.0k

Hi, Here I showed the crude way to calculate the probability from normal distribution using given information. I am not very much familiar with various statistical tests. There must be sophisticated ways to get p-values than what i mentioned here.

## generate populations from given information
a1 <- rnorm(n =1000 , mean = 1, 0.1)
a2 <- rnorm(n =1000 , mean = 10, 1)
b1 <- rnorm(n =1000 , mean = 3, 0.1)
b2 <- rnorm(n =1000 , mean = 300, 10)

## get distribution of ratios. You can change number of iterations 
n_iter <- 1000
ratios <- sapply(seq(n_iter), function(i){

        ## select 3 random from each populations  

        samp_a1  <- mean(sample(a1, 3))
        samp_a2  <- mean(sample(a1, 3))
        samp_b1  <- mean(sample(b1, 3))
        samp_b2 <- mean(sample(b2, 3))


        r1 <- sampa2 /sampa1  ## A2/A1

        r2 <- sampb2 /sampb1 ## B2/B1 

        r_ratios <- r2/r1  ## Treatment / WO Treatment

        return(r_diff)        
})

## distribution of ratios 
plot(density(ratios))

## zscore 
zscore <-  (1 -  mean(ratios)) / sd(ratios)

## probability of normal dist   
pnorm(zscore)
ADD COMMENT

Login before adding your answer.

Traffic: 2946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6