Up to I know the usual way of thinking in machine learning approach is to split the data in a train and test subsets. The first one is for fitting the model (with the support of a validation subset) and the second one is for compute model performance under different metrics (normally minimizing a loss function). It makes sense do that in such way because we want be sure of there is not over-fitting, so the model should show good performance in the test set, not in the training set. We could also be interested in divide the data in k folds, leaving a test out in each fold and do the same as explained before(fit the model in train subset and compute performance in test) k times and finally take averages.
A year ago I decided to get into Bayesian approach and everything get messy to myself in regard to model validation. As I have studied up to know, on Bayesian approach is never used a test set. Instead of that, there are some based on Criterion Information metrics (BIC, AIC, WAIC, etc.) used to estimate the "deviance" on the test set.
I don't understand why so much effort in estimating the deviance in future data instead of leaving out-of-sample subset and computing deviance there. On another hand, I don't feel very comfortable using these information criterion metrics (BIC, AIC, WAIC, etc.) because they are computed on the train set so I always try to compute the deviance on test data, but I'm not feeling comfortable either, as nobody do in the community (everybody compute information criterion on train data-set)
I would like someone can enlighten me and get rid of doubts.
Any source for reading it will be welcome. Also, any other way of extending the questions would be also welcome, because I have been stuck thinking in this topic for long time and I have not found way for moving forward.
Thank you very much in advance.
PS When I said everybody, nobody... has meaning in figuratively sense.
https://xcelab.net/rm/statistical-rethinking/
or take a look this master thesis:
https://brage.bibsys.no/xmlui/bitstream/handle/11250/2352708/13619_FULLTEXT.pdf?sequence=1&isAllowed=y
anyway, almost study from bayesian approach does not use the test, always use information criterion error estimation.
– Sergio Marrero Marrero Dec 28 '18 at 12:45