-1

I made different tests on an imbalanced dataset and got these results:

  • Model 1 = train test validation split + Cross Validation(cv=10) --> f1'micro' 0,95

  • Model 2 = train test split + smote method for imbalanced data. No Cross Validation -->f1'micro' 0,97

  • model 3 = train test validation + smote method --> f1'micro' 0,97

  • model 4 = train test + smote --> f1'micro' 0,98.

I used f1 micro as metric. Can I compare these models with f1 micro or should I take another one like f1 macro? or just the accuracy_score?

martin
  • 329
  • 3
  • 14

1 Answers1

2

Selecting the correct scoring metric depends on the business problem you are trying to solve. I would research the differences between f1 micro and macro and determine which scoring metric ultimately tracks performance of your task in a more seemly manner. For example: do you just want to maximize f1 score across all samples? Or do you care about the individual f1's of each class? Answering this will help you determine macro vs micro.

As for the core of your question it's difficult to say without a code sample. In a few cases you are using SMOTE - how are you using it? Are you up-sampling before or after you split your data and use cross validation? Up-sampling before CV/splitting your data can lead to data leakage and will artificially inflate your scoring metrics.

My advice:

  1. Pick a single appropriate scoring metric
  2. Ensure you are using SMOTE in a way that doesn't cause data leakage
  3. Perform experiments & pick the model that maximizes your scoring metric
Oliver Foster
  • 912
  • 6
  • 12