Question

Significant difference in H3 CHiP data of two conditions (P-value for skewed distribution)

0

Entering edit mode

6.9 years ago

Chirag Parsania ★ 2.0k

Hi,

Edit:

I have two different conditions for histone H3 data. For each condition I got a geneset. I plotted average line plot for each geneset from respective condition. Now, I want to confirm that whatever difference I am seeing between these two conditions in average line is statistically significant or not ?

To prove that, I did following steps.

1) Randomly I generated two genesets from respective condition and then I calculated average for each geneset. once I get the average between two random geneset from respective conditions, I calculate euclidian distance between two of them. Once I get the distribution (Total 1000 iterations), I get p value from the distribution for the distance of my original geneset using z-score. The question here is the distribution which I got from random geneset is skewed distribution while pvalue calculation from the z-tableassumes that data is normally distributed. Here, value of my skewness and kurtosis is 1.166323 and 4.91863 respectively. So, i wonder the way I am calculating pvalue is ok or I should use another distribution to get the pvalue for skewed distribution

Currently I am using zscore and resulting pvalue is significant.

See the distribution here http://rpubs.com/parsaniac/277714

Thanks Chirag.

statistics ChIP-Seq p-value h3 histone • 1.9k views

ADD COMMENT • link 6.9 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Hello Chirag Parsania!

We believe that this post does not fit the main topic of this site.

This has no connection to bioinformatics. Please make an effort to explain it if it exists.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link 6.9 years ago by Michael 54k

0

Entering edit mode

Hi Michael Dondrup,

Statistics has always direct connection with bioinformatics. I asked this question for one my NGS data analysis query in which I have H3 chip data of two different conditions. Statistically I want to prove that for set of genes difference of H3 between these two sample is very significant. Anyway, Thanks for your suggestion. I will ask on link you have provided.

~Chirag.

ADD REPLY • link 6.9 years ago by Chirag Parsania ★ 2.0k

1

Entering edit mode

Statistics has always direct connection with bioinformatics.

It is your duty to make the connection explicit. Not every stat question is relevant for bioinformatics, in the same way not every programming question is relevant for bioinformatics. Incomplete definition of the application domain is a big problem for applying statistics and of this question. Therefore we need to know the exact setup to be able to judge if the question can be answered here or at all.

is your distribution continuous or discrete
do you know the probability density function or
probability mass function?

You can see a misconception here as well:

Can I use z-score to calculate p-value

Short answer: No p-value != z-score

It is not clear what you are asking here. A p-value is an extreme value of a distribution of a test-statistic under the null hypotheses. For which observations do you calculate which test statistics? Is the (skewed) distribution known or did you infer it from the data empirically? If there is only the "distribution" you want to assign the p-value to, then there is no such thing as the p-value of a single distribution.

ADD REPLY • link 6.9 years ago by Michael 54k

0

Entering edit mode

Let me explain more clearly. As I told, I have two different conditions for histone H3 data. For each condition I got a geneset. I plotted average line plot each geneset from respective condition. Now, I want to confirm that whatever difference I am seeing between these two conditions in average line is statistically significant or not ?

To prove that, I did following steps.

1) Randomly I generated two genesets from respective condition and then I calculated average for each geneset. once I get the average between two random geneset from respective conditions, I calculate euclidian distance between two of them. Once I get the distribution (Total 1000 iterations), I get p value from the distribution for the distance of my original geneset using z-score. The question here is the distribution which I got from random geneset is skewed distribution while pvalue calculation from the z-tableassumes that data is normally distributed. Here, value of my skewness and kurtosis is 1.166323 and 4.91863 respectively. So, i wonder the way I am calculating pvalue is ok or I should use another distribution to get the pvalue for skewed distribution

Hope this explains well.!

Thanks.

ADD REPLY • link 6.9 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

you should edit your question to add these informations.

ADD REPLY • link 6.9 years ago by Nicolas Rosewick 10k

0

Entering edit mode

You sould post your question to cross-validated : https://stats.stackexchange.com/

ADD REPLY • link 6.9 years ago by Nicolas Rosewick 10k

0

Entering edit mode

I have edited and re-opened the question. Please see my edits to demonstrate how to convey enough bioinformatics context, so that the question can be answered.

ADD REPLY • link 6.9 years ago by Michael 54k

0

Entering edit mode

Thanks a lot. I really appreciate :)

Cheers, Chirag.

ADD REPLY • link 6.9 years ago by Chirag Parsania ★ 2.0k

score 0 · Answer 1 · 2017-05-17

0

Entering edit mode

6.9 years ago

Jean-Karim Heriche 27k

Your distribution looks like a log-normal so you can log-transform the data to get a normal distribution.

ADD COMMENT • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Distribution is of log value. What is the difference between log-normal and log-transform ?

ADD REPLY • link 6.9 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

A random variable X follows a lognormal distribution if log(X) is normally distributed. If you want to do a null hypothesis test, you could also use non-parametric tests or a permutation test.

ADD REPLY • link 6.9 years ago by Jean-Karim Heriche 27k