Biostar Beta. Not for public use.
Elastic net (glmnet) predictive modeling: signal lost in noise
Entering edit mode
2.5 years ago
lordoftheowl • 10


I am doing some predictive modeling of gene expression from SNP genotypes. I have about 500 expression values (centered and scaled) and about 3000 SNPs (matrix of 0,1,2). When I run my elastic net (cv.glmnet, alpha = 0.5, 10fold cv), the model "fails" to determine any predictive SNPs, ie, it assigns 0 to each weight.

However, I also have a smaller subset of ~40 SNPs that I have prior reason to believe are good predictors of expression for this gene. When I run elastic net on just these predictors, I have no problem getting out a decent model that includes most of these SNPs.

So it seems to me that I have a true signal that I can't detect once enough other SNPs are added.

Ultimately, the goal would be to detect the best predictive eQTL SNPs in an unbiased way. Are there ways to optimize my input or algorithm to avoid these false negatives?


Entering edit mode
9 months ago
EMBL Heidelberg, Germany

Try setting alpha to lower values to reduce the contribution of the lasso. The value of alpha obtained by cross-validation is a compromise between variable selection and prediction so it may not be optimal when you're concerned with variable selection. You may be interested in reading this blog on when the lasso fails and this paper evaluating elastic net for GWAS studies.

Entering edit mode
11 months ago
aquaq • 10

You could use caret package to tune for glmnet parameters. Here is a nice example of the process:


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1