Question

ROC analysis, is this ROC curve usual?

1

Entering edit mode

12 months ago

seta ★ 1.9k

Dear all,

I used the pROC library in R for ROC analysis. My response variable is binary and my independent variable is categorical as 0, 1, 2. As the output of the analysis, I obtained the ROC curve figure, but it looks a bit strange, does not? Why the corresponding line did not start from 0, is it usual or something is wrong?

enter image description here

Thanks for sharing your comments!

AUC curve ROC • 988 views

ADD COMMENT • link updated 12 months ago by Kevin Blighe 87k • written 12 months ago by seta ★ 1.9k

0

Entering edit mode

Do you have a low sample number?

ADD REPLY • link 12 months ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

No, the sample size is about 12000, of which 35% of them are cases. Based on tutorials I've read, I used 80% of the samples for the training dataset and the rest of them for the test dataset. Also, the rate of the response variable is similar in both datasets. In fact, my independent variable is an SNP that I converted to 0, 1, 2 based on the number of effect allele, is it may be the issue or something else?

ADD REPLY • link 12 months ago by seta ★ 1.9k

1

Entering edit mode

Oh, I see. We would sometimes see this plot if the dataset was very small (~3 samples), but your dataset is actually large but has some other issue - the fact that you have a binary outcome and an independent variables with just 3 levels is telling. Diagnosing the problem is difficult from here without seeing the input and output of every step. Can you share the code for the ROC curve and also the model fitting (glm())?

ADD REPLY • link 12 months ago by Kevin Blighe 87k

0

Entering edit mode

Here is the code I used:

library(pROC)
library(caret)

df <- read.csv("data.csv")
head(df)
   sample rs8176740 rs4962040 rs688976 rs529565 group
1 sample1         0         1        0        1     0
2 sample2         1         0        1        1     0
3 sample3         0         1        0        1     0
4 sample4         1         0        1        1     0
5 sample5         1         0        1        1     0
6 sample6         0         1        0        1     0

set.seed(132)
df <- df[sample(nrow(df)), ]

 train_idx <- createDataPartition(df$group, p = 0.8, list = FALSE, times = 1)
 train_data <- df[train_idx, ]
 test_data <- df[-train_idx, ]
 y_test <- df[-train_idx, "group"]
 prop.table(table(train_data$group))

        0         1 
0.6424394 0.3575606 
 prop.table(table(test_data$group))

        0         1 
0.6418721 0.3581279 

 model <- glm (group ~ rs4962040 , data= df[train_idx, ], family=binomial)
 y_pred <- predict(model, newdata=test_data, type="response")

 roc_data <- roc(y_test, y_pred)
Setting levels: control = 0, case = 1
Setting direction: controls < cases

 auc_score <- auc(roc_data)
 plot(roc_data, main=paste("ROC Curve (AUC = ", round(auc_score, 2), ")", sep=""))

Is there any problem?

Thanks

ADD REPLY • link 12 months ago by seta ★ 1.9k

1

Entering edit mode

I don't immediately see anything wrong. The unusual curve is probably just due to the fact that everything is categorical and that there are only 5 levels across the outcome and independent variables in total.

ADD REPLY • link 12 months ago by Kevin Blighe 87k