2

I need to compare 2 multi-class classifiers. So, to assess whether the difference between the two are statistically significant I have taken the following steps:

  1. obtain prediction on test data using model 1
  2. obtain prediction on test data using model 2
  3. construct a confusion matrix between predictions from model 1 and predictions from model 2
  4. use the Stuart Maxwell Test to test marginal homogeneity and in this way, compare whether the difference between the 2 classifiers are significant

Will this be a correct way to tackle this task?

I have chosen this approach since the dataset I am using is large(~1 mil records) and my target variable has 10 classes. The dataset has been split into train/test/validation. In his 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to use cross-validation. Since Stuart Maxwell Test is an alternative to McNemar’s test when it comes to more than 2 outcomes, I have chosen it to test my models.

I would really appreciate any opinion/advice on this!

Thank you!

ILR
  • 31
  • 2

0 Answers0