I read this interesting book on conformal predictors: https://arxiv.org/abs/2107.07511. Conformal predictors are a way to choose a set that's guaranteed to include the true labels with some pre-chosen certainty. I was wondering if there's a way to get conformal predictors to output calibrated probabilities? For example let's say I have a binary classification (dog or cat images). Conformal predictors can be used to predict the image is dog or cat in difficult examples. But what I'm looking for is something like calibrated p-values for the prediction. The sigmoid output values (from my neural net, for example), are well known to not be reflective of actual p values. Can conformal predictors do this (assuming of course I have an available calibration dataset)? If so can anyone point me to the procedure for this? I can't find it.
2 Answers
I think what you are looking for is something like crepes: https://github.com/henrikbostrom/crepes that seems to do what you are exactly asking (providing p-values). I stumbled upon this while looking at method to calibrate the models (i.e. fitting some spline on outputs).
The code below provide what you ask on a sklearn random forest:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from crepes import WrapClassifier
from sklearn.ensemble import RandomForestClassifier
dataset = fetch_openml(name="qsar-biodeg", parser="auto")
X = dataset.data.values.astype(float)
y = dataset.target.values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
X_prop_train, X_cal, y_prop_train, y_cal = train_test_split(X_train, y_train, test_size=0.25)
rf = WrapClassifier(RandomForestClassifier(n_jobs=-1))
rf.fit(X_prop_train, y_prop_train)
rf.calibrate(X_cal, y_cal)
rf.predict_p(X_test)
Notice that we need to split the data in three to make sure the calibration is performed separately. Regarding other implementation (tf/keras - pytorch) I don't know if this is compatible - I think not -. I have also found a venn-abers implementation that doesn't seems to need access to the model.
- 2,775
- 5
- 25
- 47
What you are looking for is calibrated probabilities, the name of Conformal Prediction method is Venn-ABERS predictors.
There are several tutorials on my repo.
https://github.com/valeman/awesome-conformal-prediction
And I have also written article on Medium "How to calibrate your classifier in an intelligent way using Machine Learning Conformal Prediction"
https://medium.com/@valeman/how-to-calibrate-your-classifier-in-an-intelligent-way-a996a2faf718
- 11
- 1