Which activation function for multi-class classification gives true probability (softmax vs sigmoid)

Question

I'm wondering which activation function for multi class classification problem, give true probability.

According to:

https://ai.stackexchange.com/questions/37889/are-softmax-outputs-of-classifiers-true-probabilities

it seems that the output probability (for softmax) doesn't represent the true probability.

Is this the same when we use sigmoid function ? Are sigmoid probabilities (for each class) more relivable than softmax ?

score 5 · Accepted Answer · answered Nov 17 '24 at 16:09

They don't represent true probability because you'd still have to calibrate your model.

Let's imagine you're trying to classify cats and dogs in a given set of images (binary classification problem, with 0 for cats and 1 for dogs). Now let's say you have a batch of 10 images as the input of your model. If, for all 10 samples, your model assigns a probability of 0.6 that they are the positive class, then you'd expect (for a well-calibrated model) to find 4 cat images and 6 dog images in your batch. After all, that's what a probability means.

Using a sigmoid won't fix your calibration problem, as your model won't even have a probability distribution as its output. You'd need to use an extra data split (if you can afford to use one) and use calibration curves to evaluate your model's calibration. Scikit-learn has a pretty insightful read on the matter.

Some other references as well, these are specific to neural networks:

Which activation function for multi-class classification gives true probability (softmax vs sigmoid)

1 Answers1