How to obtain chi-square statistics for overlaps of three GRanges objects by pair-wise?
1
1
Entering edit mode
7.8 years ago

Dear all: I want to obtain chi-square statistics for following data by element wise. My apology to ask this statistical question from this community. However, my data contains list of overlap's significance score of 3 GRanges objects, I want to get its global score by element-wise. How can I get this in R?

This is the data that I want to get its global score by element wise:

[[1]]
NumericList of length 7
[[1]] 1e-22
[[2]] 1e-19
[[3]] 1e-18
[[4]] 1e-16
[[5]] 1e-24
[[6]] 1e-20
[[7]] 1e-15

[[2]]
NumericList of length 7
[[1]] 1e-24
[[2]] 1e-24
[[3]] 1e-20
[[4]] 1e-25
[[5]] 0.1
[[6]] 1e-19
[[7]] 1e-18

[[3]]
NumericList of length 7
[[1]] 1e-11
[[2]] 1e-11
[[3]] 1e-10
[[4]] numeric(0)
[[5]] numeric(0)
[[6]] 1e-15
[[7]] numeric(0)

if you wonder third list element contains numeric(0), which refers to non-overlapped regions, so I can replace it with zero:

li.3 <- <- lapply(li.3, function(x) {
  res <- ifelse(length(x)>0, x, 0)
})

this is reproducible example :

 data <- DataFrame(
              v1=c(1e-22,1e-19,1e-18,1e-16,1e-24,1e-20, 1e-15),
              v2=c(1e-24,1e-24,1e-20,1e-25,0.1,1e-19,1e-18), 
              v3=c(1e-11,1e-11,1e-10,numeric(0),numeric(0),1e-15,numeric(0)))

my desired output something like (just example by element wise) :

global fisher score of  `(1e-22, 1e-24, 1e-11)` = ?
global fisher score of  `(1e-19, 1e-24, 1e-11)` = ?
...
global fisher score of  `(1e-24, 1e-01, numeric(0))` = ?

I want to get global score by element wise. How can I get this in R? Alternatively, I also prefer to see fisher exact test result for above data. I will be grateful if anyone can give me any idea for doing this. Thanks a lot

R chi-square overlap DataFrame • 2.7k views
ADD COMMENT
0
Entering edit mode

Where are the GRanges objects?

ADD REPLY
0
Entering edit mode

Dear Giovanni M Dall'Olio:

I afraid it would be bit of long thread if I listed all step here (from finding overlap, conditionally filtering,expand them as GRanges), so I did not show reproducible step here. However, the data that I want to get global score is from the result of some sort of filtering by element-wise. so I have to make sure its geometric property of vector. To be specific, All I want to do is to get its global Fisher scores by element wise. To be clarify, v1 refers to significant score of query, while v2, v3 are significance score (a.k.a, pvalueLog )of subjects (a.k.a, overlapped GRanges objects). I need to do element-wise operation to getting global score. I hope I would have some idea from this community.

ADD REPLY
3
Entering edit mode
7.8 years ago

You have a data frame with three columns:

> data
DataFrame with 7 rows and 3 columns
         v1        v2        v3
  <numeric> <numeric> <numeric>
1     1e-22     1e-24     1e-11
2     1e-19     1e-24     1e-11
3     1e-18     1e-20     1e-10
4     1e-16     1e-25     0e+00
5     1e-24     1e-01     0e+00
6     1e-20     1e-19     1e-15
7     1e-15     1e-18     0e+00

What confuses me is that this dataframe seems to contain p-values already. So what do you want to calculate exactly?

You may combine p-values, assuming they are independent, using different approaches. The simplest is just by taking their mean (see When combining p-values, why not just averaging? )

> data$global = apply(data[1:3], 1, mean)
> data
DataFrame with 7 rows and 4 columns
         v1        v2        v3       global
  <numeric> <numeric> <numeric>    <numeric>
1     1e-22     1e-24     1e-11 3.333333e-12
2     1e-19     1e-24     1e-11 3.333333e-12
3     1e-18     1e-20     1e-10 3.333333e-11
4     1e-16     1e-25     0e+00 3.333333e-17
5     1e-24     1e-01     0e+00 3.333333e-02
6     1e-20     1e-19     1e-15 3.333700e-16
7     1e-15     1e-18     0e+00 3.336667e-16
>

More accurate methods to combine p-values would include Fisher's method. See for example http://stats.stackexchange.com/questions/168181/r-package-for-combining-p-values-using-fishers-or-stouffers-method for some R packages to do it.

For example:

> library(metap)
> data$global = apply(data[1:3], 1,  function(df) sumlog(df)$p)
Warning messages:
1: In sumlog(df) : Some studies omitted
2: In sumlog(df) : Some studies omitted
3: In sumlog(df) : Some studies omitted
> data
DataFrame with 7 rows and 4 columns
         v1        v2        v3       global
  <numeric> <numeric> <numeric>    <numeric>
1     1e-22     1e-24     1e-11 8.745181e-54
2     1e-19     1e-24     1e-11 7.855507e-51
3     1e-18     1e-20     1e-10 6.219311e-45
4     1e-16     1e-25     0e+00 9.540599e-40
5     1e-24     1e-01     0e+00 5.856463e-24
6     1e-20     1e-19     1e-15 7.855507e-51
7     1e-15     1e-18     0e+00 7.698531e-32
> sumlog(c(1e-22, 1e-24, 1e-11))
chisq =  262.4947  with df =  6  p =  8.745181e-54
>
ADD COMMENT
0
Entering edit mode

Thanks a lot for your quick respond. Maybe I wasn't state the problem much clear. I need to use fisher exact test for each row of my data to get its combined pvalue. From your results, it is very close to my desired output, but I am not sure its identical with fisher.test. Instead, if I used fisher.test methods from base packages, this is the code that might give me what I want :

  fish.res <- apply(data,1, function(x) fisher.test(matrix(x,nr=2))$p.value,$odds.ratio)

but it gave me error. is above code also yield same result like yours if error was fixed? Thank you very much.

ADD REPLY
1
Entering edit mode

I don't think you can calculate a fisher test of fisher p-values. Moreover you would need a 2X2 contingency matrix to calculate a fisher test (e.g. see this tool for an example of the input you would expect: http://graphpad.com/quickcalcs/contingency1.cfm ). Maybe you meant to use Fisher's method to combine p-values, which is a different thing than Fisher's exact test??

ADD REPLY
0
Entering edit mode

Dear Giovanni M Dall'Olio:

I am very grateful for your correction, certainly I totally misunderstood the difference between fisher method and fisher exact test. Indeed, I certainly needs combined p-value by using Fisher' method. Is your solution yield Fisher' method that obtain combined pvalue by element-wise? Thanks again for your great help here.

Jurat

ADD REPLY
1
Entering edit mode

You are welcome Jurat. Yes you can use the solution using sumlog from the metap library.

ADD REPLY
0
Entering edit mode

Dear Giovanni M Dall'Olio:

How can I add chisq as new slot for data? I mean I let data have global and chisq attributes. Thanks a lot

ADD REPLY

Login before adding your answer.

Traffic: 1618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6