Normality Assumption for Linear Regression
1
0
Entering edit mode
5.2 years ago
nbotha1994 • 0

How strict is linear regression on the assumption of normality?

Even after transforming my data with the optimum method for normalization (identified by using bestNormalize package in R), my data is still not normal or near normal. Can I still do a linear regression on this not normally distributed data?

Thanks

data transformation quantitative trait association • 1.6k views
ADD COMMENT
1
Entering edit mode

Lots of data will never have normal residules because it is inherently not normally distributed data. You should not model this data with a linear regression, but rather use a generalised linear model from a family appropriate for the data to model it. To help you choose what might be an appropriate model, we would need to know what the data is.

ADD REPLY
5
Entering edit mode
5.2 years ago

The assumption of normality relates to the residuals and not the data itself. I, personally, would not use packages like bestNormalize to tell me how the data looks. Also remember that only fabricated data will have a perfect normal. You could start by plotting a histogram of your data and providing metrics like min, max, covariance, sdev, mean, median, interquartile range, etc.

ADD COMMENT
0
Entering edit mode

Hi, thank you for your response.

I don't think I understand what residuals mean? Is it your data after transformation or the data after linear regression has been performed?

ADD REPLY
3
Entering edit mode

The residual at a given point is the difference between the value estimated by the model and the actual value.

ADD REPLY

Login before adding your answer.

Traffic: 1549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6