As the subject says. On one hand, the answer should be yes because label smoothing is a regularization feature and how can you know if it improves performance without turning it off? On the other hand, I haven't seen any authoritative source claiming that it should be turned off during validation, not even the article that introduced the technique mentions it. And afaict, as the feature is implemented in Keras it is not turned off during validation.
3 Answers
The way most people gain an initial understanding of label smoothing (and what most common explanations have to say on the subject) plays a great role in how one would approach this question.
At first glance, label smoothing is exactly what the name suggests: we modify the labels or some portion of them in order to get a better, more general, more robust model. It makes sense that we don't want the model to learn from (and later predict with) extreme confidence levels, especially when we know that some of the labels are wrong, since this hurts the model's abilities to perform on unseen data. The intuitive explanation of LS mechanics, then, is that we are not feeding the model with pure 1's and 0's, but with less confident values instead, resulting in a more reserved decision function that doesn't extrapolate in an extreme manner. Now, we know that smoothed labels are not the true labels, so at this point the main concern pops up - once we have trained on smoothed labels, do we also use smoothed labels for validation?
If you think about LS purely as a data manipulation technique, the answer to the question above is not obvious, as it may go both ways depending on the argument. However, one must remember that LS is almost always regarded as a regularization technique -- you mention this in the question yourself -- and there is a good reason for that. Regularization, by definition, is when the loss function is extended with an additional regularization term, which usually has to do with penalization. In LS, this penalty term is responsible for punishing high confidence predictions. Even though it may not appear as such, LS, once applied, becomes an essential part of the loss function, which should persist between training and validation if we aim to take advantage of the technique. When we apply LS during training, we are effectively trying to minimize a loss function with the regularization term added in. Throwing it out during validation would defeat the very purpose of including it in the first place: if we decide not to apply LS to the validation set as well, we are making the mistake of expecting extreme confidence (1/0 labels) from a learner that was just a moment ago specifically trained against making overly confident predictions on previously unseen data. The correct thing to do would be to validate that the predictions are of moderate confidence, just as desired. This is why the validation set should also have the regularizer present in the loss function, i.e. have smoothed labels as well.
- 620
- 3
- 7
AFAIK label smoothing comes into picture while calculating the loss while training. There is no loss computation during validation.
- 101
Label Smoothing is a regularizer technique that is applied to target value so that the model can learn the data well without overfitting. There is no need to do label smoothing for validation.But even if you do it, it won't be problem.
- 878
- 4
- 9