Entering edit mode

20 months ago

ulises.rodriguez
•
0

I have a data.frame () with the values of the observed and expected frequency of a kmer in multiples genomes, and I would like to obtain a threshold value to classify the genomes according to their observed and expected values. I have been trying with Chi-square test and G-test, but I'm not sure these tests are the right ones.

I have also tried to plot **log(obsved-expected)^2/expected** as a function of the **log (observed / expected)**

Could you recommend me some statistical test to perform this task?

Can you post an excerpt of your data.frame? Or all of it if not too big, possibly anonymizing the confidential information in it.

this is a sample of the data, I have approximately 3500 genomes

https://i0000.clarodrive.com/s/oxfB4puIAmmrCrf

What do you mean by "to classify the genomes according to their observed and expected values" ? The Chi squared test evaluates the difference between observed and expected frequencies under the model that generated the expected frequencies so it is the correct test to use if this is what you intend to test.