Machine learning predictor has no correlation with output variable
1
0
Entering edit mode
5.4 years ago
druggable ▴ 60

Hi Everyone,

I built a logistic regression classifier for predicting genes that has around 75% accuracy. Predictor A is a transcription factor that is a top predictor. However, when I look at the the scatterplot there is absolutely no correlation and the positive and negative examples show the same distribution. Is this still reasonable?

Thank you everyone.

machine learning • 2.3k views
ADD COMMENT
3
Entering edit mode
5.4 years ago

That a predictor is not correlated with the output is not surprising and doesn't mean anything. If all predictive features had to be highly correlated with the output, we would not need any fancy machine learning algorithm, we could just look for highly correlated features. A feature highly correlated with output is a good candidate for a predictor but a good predictor doesn't have to be a feature correlated with output. A correlation between two variables only says how linearly related the two variables are but doesn't provide any information on how these variables relate to others. You could find features that are not pairwise correlated with the output variable but whose combination is correlated with the output.

ADD COMMENT
0
Entering edit mode

Hi Jean-Karim,

Thank you for your help. I would just like to further clarify, since the logistic regression takes into account the individual contributions of each feature, but I did not add any interaction terms. In this case, I'm not sure if i can say that there is some interaction among the features.

ADD REPLY

Login before adding your answer.

Traffic: 2533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6