Question

Elastic net (glmnet) predictive modeling: signal lost in noise

0

Entering edit mode

6.6 years ago

lordoftheowl ▴ 10

Hello,

I am doing some predictive modeling of gene expression from SNP genotypes. I have about 500 expression values (centered and scaled) and about 3000 SNPs (matrix of 0,1,2). When I run my elastic net (cv.glmnet, alpha = 0.5, 10fold cv), the model "fails" to determine any predictive SNPs, ie, it assigns 0 to each weight.

However, I also have a smaller subset of ~40 SNPs that I have prior reason to believe are good predictors of expression for this gene. When I run elastic net on just these predictors, I have no problem getting out a decent model that includes most of these SNPs.

So it seems to me that I have a true signal that I can't detect once enough other SNPs are added.

Ultimately, the goal would be to detect the best predictive eQTL SNPs in an unbiased way. Are there ways to optimize my input or algorithm to avoid these false negatives?

~misha

R prediction machine learning • 2.3k views

ADD COMMENT • link updated 6.6 years ago by aquaq ▴ 40 • written 6.6 years ago by lordoftheowl ▴ 10

score 1 · Answer 1 · 2017-08-31

Try setting alpha to lower values to reduce the contribution of the lasso. The value of alpha obtained by cross-validation is a compromise between variable selection and prediction so it may not be optimal when you're concerned with variable selection. You may be interested in reading this blog on when the lasso fails and this paper evaluating elastic net for GWAS studies.

score 0 · Answer 2 · 2017-08-31

0

Entering edit mode

6.6 years ago

aquaq ▴ 40

You could use caret package to tune for glmnet parameters. Here is a nice example of the process: http://rstudio-pubs-static.s3.amazonaws.com/14372_1700240153ae4c2190feb1c5ced2d1e5.html

ADD COMMENT • link 6.6 years ago by aquaq ▴ 40