Question: Injecting Outliers in the dataset using R
0
Entering edit mode

Hello, I am working on generating a dataset with n=20 in a linear regression y=b0+b1x+e* (i am not sure whether i should include the error term in my code).

  • x and y are normally distributed with mean 0 and standard deviation 1.
  • the error term e is also said to be normally distributed with mean 0 and sd 1, BUT with 10% identical outliers in the y direction

My code starts with this

n11 <- 20
m1 <- 0
sd1<- 1
b0 <- 0
b1 <- 1
x <- rnorm(n11,m1, sd1)

y <- b0 + b1*x + e11

e11 <- rnorm(n11,m1, sd1)

data11<-data.frame(y,x,e11,b0,b1)

model1<-lm(y~x, data=data11)

I don't know how and where I should put in code the said 10% identical outliers in the y direction I need help. Thank you so much.

ADD COMMENTlinkeditmoderate 12 months ago sabbmontes • 0 • updated 12 months ago Jean-Karim Heriche 19k
0
Entering edit mode

By definition outliers are points not generated by the distribution under consideration so just produce values by using another distribution. If the outliers have to be in the error term, produce error values using any distribution not N(0,1) e.g. N(0,100) or anything that fits your model of where outliers should come from. 10% of 20 = 2 so you only need two outliers. Depending on what should be identical (the error outliers or the y outliers) produce one outlier error and use it to produce two y values or use it to produce one y value and duplicate it.

ADD COMMENTlinkeditmoderate 12 months ago Jean-Karim Heriche 19k

Login before adding your answer.

Powered by the version 2.0