So I want to stop the cnn when a custom (not implemented in keras) logged metric is not improving with a patience of 5 (I chose macro f1 score) and here's what I did:
Created a callback to log the macro f1 score on epoch end and an early stopping:
early_stopping = EarlyStopping(monitor='val_macro_f1', patience=5, restore_best_weights=True)
macro_f1_callback = MacroF1Callback(x_valid_combined_tfidf, y_valid)
And here is the (simplified) output of the fitting:
Epoch 1/20
Validation Macro F1 Score: 0.3983
Epoch 2/20
Validation Macro F1 Score: 0.3369
Epoch 3/20
Validation Macro F1 Score: 0.4057
Epoch 4/20
Validation Macro F1 Score: 0.3947
Epoch 5/20
Validation Macro F1 Score: 0.3761
Epoch 6/20
Validation Macro F1 Score: 0.3918
Epoch 7/20
Validation Macro F1 Score: 0.4147 <keras.src.callbacks.History at 0x4cae76210>
And after predicting again on the validation data, it seems that the early stopping chose the better weights to be from...epoch 2...
F1 score: 0.33687923314086654
All this doesn't make sense, because ok, epoch 2 started decreasing the metric since its lower then epoch 1, and it ends in epoch 7, but epoch 7 has a metric even better the epoch 1
Can anyone help me with this? Maybe I'm doing something wrong. Also please tell me if you want to paste some more code here.
Thanks!