Biostar Beta. Not for public use.
statistics data, observed vs expected values
0
Entering edit mode
20 months ago

I have a data.frame () with the values of the observed and expected frequency of a kmer in multiples genomes, and I would like to obtain a threshold value to classify the genomes according to their observed and expected values. I have been trying with Chi-square test and G-test, but I'm not sure these tests are the right ones.

I have also tried to plot log(obsved-expected)^2/expected as a function of the log (observed / expected)

Could you recommend me some statistical test to perform this task?

table

R • 429 views
ADD COMMENTlink
0
Entering edit mode

Can you post an excerpt of your data.frame? Or all of it if not too big, possibly anonymizing the confidential information in it.

ADD REPLYlink
0
Entering edit mode

this is a sample of the data, I have approximately 3500 genomes

https://i0000.clarodrive.com/s/oxfB4puIAmmrCrf

ADD REPLYlink
0
Entering edit mode

What do you mean by "to classify the genomes according to their observed and expected values" ? The Chi squared test evaluates the difference between observed and expected frequencies under the model that generated the expected frequencies so it is the correct test to use if this is what you intend to test.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1