I am training a model on an imbalanced dataset (about 5-20% of positive class) and trying out different algorithms in R using caret package. I have 57 predictors and around 2000-3000 observations in my training dataset.

So far, I tried several models and got ROC and AUC PR plots for these models:

I see a lot of criticism of using Stepwise Logistic Regression with R and I do understand that there are indeed a lot of problems with it. At the same time, I see that it is doing rather well and I am not sure how to interpret it. May it be that I do something wrong with training other models?

I am using repeated 5-fold cross-validation:

```
objControl <- trainControl(method = 'repeatedcv',
number = 5,
repeats = 5,
summaryFunction = twoClassSummary,
classProbs = TRUE)
gbm_fit <- train(training[,predictors, drop = FALSE], training[[bm_name]],
method='gbm',
verbose = TRUE,
trControl = objControl,
metric = "ROC",
preProc = c("center", "scale"),
train.fraction = 0.5)
```

Any guidance is highly appreciated.

Thank you!

- Serverfault Help
- Superuser Help
- Ubuntu Help
- Webapps Help
- Webmasters Help
- Programmers Help
- Dba Help
- Drupal Help
- Wordpress Help
- Magento Help
- Joomla Help
- Android Help
- Apple Help
- Game Help
- Gaming Help
- Blender Help
- Ux Help
- Cooking Help
- Photo Help
- Stats Help
- Math Help
- Diy Help
- Gis Help
- Tex Help
- Meta Help
- Electronics Help
- Stackoverflow Help
- Bitcoin Help
- Ethereum Help