0

These days, training data aren't put in gradient descent all at once. Rather, they are put in batch after batch. Gradient descent is run once for each batch of training data. When all batches are traversed, 1 epoch is done. However, gradient descent hasn't minimized the loss function yet, the loss function still being on a slope. The loss function now and weights serve as initial values for the next epoch. Weights are initialized only at the beginning of the 1st batch of the 1st epoch.

During each epoch, the validation set is used to calculate the error (evaluated using trained weights) on the validation set. If the error is dropping, finish this epoch and go to the next epoch. Chances are, when the validation error is minimized, the current epoch isn't traversed yet, the remaining batches of training data are left unused.

If the above is all right, seems that the hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers won't be trained or tuned?

feynman
  • 237
  • 1
  • 8

1 Answers1

1

Chances are, when the validation error is minimized, the current epoch isn't traversed yet, the remaining batches of training data are left unused.

Typically, validation is done only at the end of the epoch. So no batch should remain unused. For example, this is the behavior of Keras https://keras.io/models/model/#fit

validation_freq: Only relevant if validation data is provided. Integer or list/tuple/set. If an integer, specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs. If a list, tuple, or set, specifies the epochs on which to run validation, e.g. validation_freq=[1, 2, 10] runs validation at the end of the 1st, 2nd, and 10th epochs.

hyper parameters: number of ConvNet filters, size of ConvNet filters, number of layers won't be trained or tuned?

Yes, such hyper parameters are set when model is defined. These are not modified during training.

However : Systems like Google AutoML are trying to change this. These systems try to learn model hyper parameters during training.

Shamit Verma
  • 2,319
  • 1
  • 10
  • 14