I have some sets of data(data was transformed to log-ratio),and i want to do data normalization like an article wrote: " A 2-component Gaussian mixture model-based normalization algorithm was used to achieve this normalization.The two Gaussians(μ1,σ1) and(μ2,σ2) for a sample i were fitted and used in the normalization process as follows: the mode mi of the log-ratio distribution was determined for each sample using kernel density estimation with a Gaussian kernel and Shafer-Jones bandwidth. Then A two-component Gaussian mixture model was then fit with the mean of both Gaussians constrained to be 𝑚i, i.e., 𝜇1i = 𝜇2i = 𝑚i. The Gaussian with the smaller estimated standard deviation 𝜎𝑖 = min(𝜎̂1𝑖, 𝜎̂2𝑖) was used to normalize the sample. The sample was standardized using N(mi,σi) by subtracting the mean mi from each gene and dividing by the standard deviation σi. Constrained fitting of mixture models was implemented using the mixtools R package."
And i'm not good at statistics stuff ,So can anyone be kind to teach me how to write the R code to achieve the normalization effect just like the article wrote? Thanks in advance.
Does not the paper comes with a code snippet that you can use? it clearly comes with the method of incorporating the mix modeling , the package is a good way to start. Take a look at the package and try to play with the examples and see the mixture plots and how the fit model is derived. You can also see how
mclust
is used for the same purpose of mixture model fit for clustering here.This blog post is also having a nice and lucid explanation you can take a look at it and alternatively this link also serves a nice code snippet and explanation of when and why to use with
mixtools
and apply it for clustering.Thanks first.There is no code snippet in the article.And I just want to do the normalization by 2-component Gaussian mixture model-based algorithm as the article wrote,So how to finish the following code : library(mixtools) a<-rnorm(1000) mi<-density(a)#### kernel density estimation with a Gaussian kernel and Shafer-Jones bandwidth ??? out<-mvnormalmixEM(a,k=2,lambda=NULL,mu =?,sigma = ?) ####like the article said "μ1i =μ2i =mi" ,so the'mu'should be set to equal? But the mi is a vector of length 1000,SO how do i set the 'mu' and 'sigma'? And at last ,i hope to get the result that 2 component (μ1i =μ2i =mi,sigma1≠sigma2),and i can choose the smaller sigma to normalize the sample.