Questions tagged [probability-calibration]

53 questions
18
votes
4 answers

XGBoost outputs tend towards the extremes

I am currently using XGBoost for risk prediction, it seems to be doing a good job in the binary classification department but the probability outputs are way off, i.e., changing the value of a feature in an observation by a very small amount can…
12
votes
1 answer

Are the raw probabilities obtained from XGBoost, representative of the true underlying probabilties?

1) Is it feasible to use the raw probabilities obtained from XGBoost, e.g. probabilities obtained within the range of 0.4-0.5, as a true representation of approximately 40%-50% chance of an event occurring? (assuming we have an accurate model) 2)…
7
votes
1 answer

Are calibrated probabilities always more reliable?

EDIT: Based on the answer below, I have updated the question and added more detail. I have applied Dirichlet calibration to my fast-bert sentiment classification model, and I am struggling to really understand why/ if it is actually more reliable.…
7
votes
2 answers

Probability Calibration : role of hidden layer in Neural Network

I try a simple Neural Network (Logistic Regression) to play with Keras. In input I have 5,000 features (output of a simple tf-idf vectorizer) and in the output layer I just use a random uniform initialization and an $\alpha = 0.0001$ for $L_{2}$…
BimBimBap
  • 91
  • 1
  • 3
5
votes
1 answer

XGBoost: how to adjust the probabilities of a binary classifier to match training data?

Training and testing data have around 1% positives, but the model predicts only around 0.1% as positives. The model is an xgboost classifier. I’ve tried calibration but it didn’t improve much. I also don’t want to pick thresholds since the final…
5
votes
2 answers

convert predict_proba results using class_weight in training

As my dataset is unbalanced(class 1: 5%, class 0: 95%) I have used class_weight="balanced" parameter to train a random forest classification model. In this way I penalize the misclassification of a rare positive cases. rf =…
4
votes
1 answer

why does my calibration curve for platts and isotonic have less points than my uncalibrated model?

i train a model using grid search then i use the best parameters from this to define my chosen model. model = XGBClassifier() pipeline = make_pipeline(model) kfolds = StratifiedKFold(3) clf = GridSearchCV(pipeline, parameters,…
3
votes
1 answer

How to determine the correct target for classification probability when the observed samples are probabilities of each class?

I have data in which each event's outcome can be described by a probability of a categorical occurrence. For example, if all of the possible class outcomes are A, B, C, or D suppose in one event 7/10 people selected category A, 2/10 selected…
3
votes
1 answer

I have 3 graphs of a binary Logistic Regression that I want to understand better what is happening and learn of a strategy to make the model better

My problem is the following: I have a binary Logistic Regression model with a very imbalanced dataset that outputs the percentage of the prediction. As can be seen in the images, as the threshold is increased there's a certain point it stops…
3
votes
1 answer

Probability calibration is worsening my model performance

I'm using RandomForest and XGBoost for binary classification, and my task is to predict probabilities for each class. Since tree-based models are bad with outputting usable probabilities, i imported the sklearn.calibration CalibratedClassifierCV,…
Blenz
  • 2,124
  • 13
  • 29
3
votes
0 answers

How to explain a Calibration Plot for many models?

I have a heavy imbalanced dataset with a classification problem. I try to plot the Calibration Curve from the sklearn.calibration package. In specific, I try the following models: rft = RandomForestClassifier(n_estimators=1000) svc = SVC(probability…
Tasos
  • 3,960
  • 5
  • 25
  • 54
3
votes
1 answer

which loss function (if any) optimizes the calibration graph

The calibration graph is the predicted versus actual probability(see http://scikit-learn.org/stable/modules/generated/sklearn.calibration.calibration_curve.html). Is it possible to optimize the linearity of that curve in terms of a loss function?…
2
votes
1 answer

Calibrating probability thresholds for multiclass classification

I have built a network for the classification of three classes. The network consists of a CNN followed by two fully-connected layers. The CNN consists of convolutional layers, followed by batch normalization, a RELU activation, max pooling and drop…
2
votes
0 answers

xgboost calibration kde plots (isotonic) not smooth

i am training my xgboost model on an imbalanced binary classification problem. It is important to me to have well calibrated probabilities so i have chosen to optimize the brier score. I then plot the kde and reliability curve of my models where i…
Maths12
  • 546
  • 6
  • 16
2
votes
1 answer

How can i tell if my model is overfitting from the distribution of predicted probabilities?

all, i am training light gradient boosting and have used all of the necessary parameters to help in over fitting.i plot the predicted probabilities (i..e probabililty has cancer) distribution from the model (after calibrating using calibrated…
1
2 3 4