Am I building a good or bad model for prediction built using Gradient Boosting Classifier Algorithm?

Question

I am building a binary classification model using GB Classifier for imbalanced data with event rate 0.11% having sample size of 350000 records (split into 70% training & 30% testing).
I have successfully tuned hyperparameters using GridsearchCV, and have confirmed my final model for evaluation.
Results are:
Train Data-
[[244741 2] [ 234 23]]

          precision    recall  f1-score   support
   0       1.00      1.00      1.00    244743  
   1       0.92      0.09      0.16       257  

accuracy          -         -      1.00    245000

macro avg 0.96 0.54 0.58 245000
weighted avg 1.00 1.00 1.00 245000

test data -
[[104873 4] [ 121 2]]

          precision    recall  f1-score   support
   0       1.00      1.00      1.00    104877  
   1       0.33      0.02      0.03       123  

accuracy          -         -      1.00    105000

macro avg 0.67 0.51 0.52 105000
weighted avg 1.00 1.00 1.00 105000

AUC for both class 1 & 0 is 0.96
I an not sure if this is a good model I can use for predicting probability of occurrence. Please guide.

score 5 · Accepted Answer · answered Jul 30 '22 at 07:35

"Unbalanced" data are not a problem, unless you use unsuitable error measures... like accuracy, or precision, recall and the F1 (or any other Fbeta) score, all of which suffer from exactly the same problems as accuracy. Instead, work directly with probabilistic predictions, and assess the probabilistic predictions directly using proper scoring rules.

Do not use thresholds in evaluating your statistical model. The choice of one or more (!) thresholds is an aspect of the decision, together with your probabilistic classification. It is not part of the statistical model.

We have many, many, many threads on unbalanced data at CrossValidated, and we are at a bit of a loss what to do with these, because the data science community apparently sees a problem here that completely disappears once you move away from intuitive but misleading evaluation measures. We have a Meta.CV thread dedicated to this, with a number of links to other CV threads.

score 4 · Answer 2 · answered Jul 29 '22 at 14:45

Your minority class is highly under-represented. I recommend not to proceed forward.

My suggestion would be the following:

1.) Undersample the majority class 2.) Use SMOTE to oversample the minority class 3.) Re-run the algorithm and verify the metrics

As of now, Your recall is extremely low. It signifies your model has not even identified 10% of training positive class and not even 5% of Testing positive class. This is expected considering the simplicity of your model.

I suggest you to Oversample and drop some records of the majority class. It will help you get a better view.

AUC basically only gives you Area under the curve which is based on the best tradeoff between Recall and False Positives. Since your model is highly imbalanced, AUC is not a good metric to evaluate.

score 3 · Answer 3 · answered Jul 29 '22 at 14:56

Your model might be alright, but certainly not with a default classification threshold. As you can see, you only detected 2 out of 123 events this way. ROC_AUC tends to be overoptimistic for this level of imbalance. Studying sklearn.metrics.precision_recall_curve() could shed some light on this and help you select a decent classification threshold perhaps.

Alternatively, you may try resampling your dataset first (assuming you're mostly interested in the positive class).

Am I building a good or bad model for prediction built using Gradient Boosting Classifier Algorithm?

3 Answers3