I'm exploring several ML models for in-sample forecasting task. I'm wondering if there is a straightforward way to identify/detect the good/bad learning.
Classic approach is as it is used for deep learning models, we check plot of loss curve of via history of loss function of train and test sets over epochs. What about ML-based models?
The best something tell us that if learning or generalization was:
- Good generalization and print me => good fit
- Bad generalization and print me =>
- Bad fit => overfitting
- Bad fit => underfitting
To the best of my knowledge, best practice or state-of-the-art is:
- like Deep Learning models we do still plot Learning Curve for Machine Learning models over number of samples in training sets or
- check potential over/under-fitting using plot based on
scoring="neg_mean_squared_error":
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import LearningCurveDisplay
Evaluate the models using cross-validation
scores = cross_val_score(pipeline, X, y, scoring="neg_mean_squared_error", cv=10)
- I'm not sure but it seems this approach has nothing to do with diagnose learning if something is wrong and this only for models Evaluation to compare the models, Plot the true and estimated coefficients.
I would happy if someone has Pythonic solutions to share like new package or libs or wrappers or recent workarounds to give us this insight about ML-based models' learning.
Side note: This question is for finding best practice for only identify learning not looking for diagnosing (bad) generalizing and its treatment solutions like fine-tuing hyper-parameters and GridSerach solutions!