Are the raw probabilities obtained from XGBoost, representative of the true underlying probabilties?

Question

1) Is it feasible to use the raw probabilities obtained from XGBoost, e.g. probabilities obtained within the range of 0.4-0.5, as a true representation of approximately 40%-50% chance of an event occurring? (assuming we have an accurate model)

2) The same question as in 1), but for other models that can output raw probabilities.

score 14 · Accepted Answer · edited Aug 07 '19 at 07:19

It depends on the definition of accurate model, but in general the answer to your question 1) is No.

Regarding your second question (based on results in the paper of Niculescu-Mizil & Caruana linked below):

boosted trees and stumps - NO
Naive Bayes - NO
SVM - NO
bagged trees - YES
neural nets - YES

You can test whether it is the case for your particular model and dataset by looking at the so called reliability plot:

Create N bins based on the model output (e.g. 10-20)
Create a scatter plot with average model output for each bin along X axis and average true probability for each bin along Y axis

Ideally, your X-Y points should lie near the diagonal Y=X, otherwise the output of your classifier cannot be interpreted as a probability of an event.

However, not all is lost and if needed, one can try to modify (calibrate) the output of the model in a such way that it better reflects the observed probability. In order to assess whether the calibration exercise was successful one might look at the reliability plot based on the calibrated model output (instead of using raw model output).

Two most widely used techniques for the calibration of classifier outputs are Platt scaling and isotonic regression, see the links below.

Note, that it is not advisable to cailbrate the classifier using the training dataset (you might need to reserve a separate subset of your data for the purpose of calibration).

Some relevant links

Predicting Good Probabilities With Supervised Learning

CALIBRATING CLASSIFIER PROBABILTIES

Classifier calibration with Platt's scaling and isotonic regression