glmStepAIC model is doing better that other models

by kintany   Last Updated September 11, 2019 16:19 PM - source

I am training a model on an imbalanced dataset (about 5-20% of positive class) and trying out different algorithms in R using caret package. I have 57 predictors and around 2000-3000 observations in my training dataset.

So far, I tried several models and got ROC and AUC PR plots for these models:

enter image description here

I see a lot of criticism of using Stepwise Logistic Regression with R and I do understand that there are indeed a lot of problems with it. At the same time, I see that it is doing rather well and I am not sure how to interpret it. May it be that I do something wrong with training other models?

I am using repeated 5-fold cross-validation:

objControl <- trainControl(method = 'repeatedcv', 
                         number = 5, 
                         repeats = 5, 
                         summaryFunction = twoClassSummary, 
                         classProbs = TRUE)

 gbm_fit <- train(training[,predictors, drop = FALSE], training[[bm_name]], 
                   verbose = TRUE,
                   trControl = objControl,  
                   metric = "ROC",
                   preProc = c("center", "scale"),
                   train.fraction = 0.5)

Any guidance is highly appreciated.

Thank you!

Related Questions

why Caret glm prediction RMSE is different

Updated April 14, 2017 11:19 AM

predict pulling the wrong data in R

Updated August 04, 2018 23:19 PM

caret chooses non-optimal RMSE?

Updated December 29, 2018 14:19 PM