Comparing two multi-class machine learning classifiers using Stuart Maxwell Test

Question

I need to compare 2 multi-class classifiers. So, to assess whether the difference between the two are statistically significant I have taken the following steps:

obtain prediction on test data using model 1
obtain prediction on test data using model 2
construct a confusion matrix between predictions from model 1 and predictions from model 2
use the Stuart Maxwell Test to test marginal homogeneity and in this way, compare whether the difference between the 2 classifiers are significant

Will this be a correct way to tackle this task?

I have chosen this approach since the dataset I am using is large(~1 mil records) and my target variable has 10 classes. The dataset has been split into train/test/validation. In his 1998 paper, Thomas Dietterich recommended the McNemar’s test in those cases where it is expensive or impractical to use cross-validation. Since Stuart Maxwell Test is an alternative to McNemar’s test when it comes to more than 2 outcomes, I have chosen it to test my models.

I would really appreciate any opinion/advice on this!

Thank you!

Comparing two multi-class machine learning classifiers using Stuart Maxwell Test

0 Answers0