I'm trying to understand how tolerance check is done in Mini-Batch Gradient Descent. Here are some methods but I'm not sure which one is the most common approach:
1)
- Begin the epoch
- Shuffle dataset
- For each batch in dataset
- Make predictions using current weights
- Compute gradients
- Update weights
- Make predictions using updated weights
- Compute loss for current batch and store it
- Go for next batch till end of dataset
- Find the average of all loss values
- Check for tolerance
- Go for next epoch
- Begin the epoch
- Shuffle dataset
- For each batch in dataset
- Make predictions using current weights
- Compute loss for current batch and store it
- Compute gradients
- Update weights
- Go for next batch till end of dataset
- Find the average of all loss values
- Check for tolerance
- Go for next epoch
- Begin the epoch
- Shuffle dataset
- For each batch in dataset
- Make predictions using current weights
- Compute gradients
- Update weights
- Go for next batch till end of dataset
- Make predictions for the whole dataset and compute loss
- Check for tolerance
- Go for next epoch
Any help is really appreciated :)