Is it a problem to use the test dataset for the hyperparameter tuning, when I want to compare 2 classification algorithms on the 10 different dataset?

Question

I know that we should use the validation set to perform hyperparameter tuning and that test dataset is not anymore really the test if it is used for hyperparameter tuning. But is this a problem if i want to compare the performance of 2 algorithms (e.g., Random Forest and XGBoost) across 10 different datasets, where each time I am using the test data for tuning. I believe that if they are trained and tested under the same conditions, the final performance analysis should be actually true representation which algorithm is better performing on these datasets. Or am i mistaken?

score 2 · Answer 1 · answered Oct 05 '23 at 13:43

Seems like there is something flawed in the procedure here. If you use the test data set for tuning, then what do you use for testing performance?

In general, the models should not get any information from the test set. If models are exposed to the test set you will generally tend to conclude the more flexible model has better performance when it may tend to overfit the training data and underperform simpler models if the test set is isolated during the model fitting procedure.

Is it a problem to use the test dataset for the hyperparameter tuning, when I want to compare 2 classification algorithms on the 10 different dataset?

1 Answers1