Questions tagged [model-evaluations]

This tag is meant to be used for questions related to how to evaluate a model performance, not only based on standard metrics, but also in the context of real use case applications. What is a good model might depend on many factors to take into account, to eventually get really useful data science applications.

367 questions
295
votes
8 answers

Micro Average vs Macro average Performance in a Multiclass classification setting

I am trying out a multiclass classification setting with 3 classes. The class distribution is skewed with most of the data falling in 1 of the 3 classes. (class labels being 1,2,3, with 67.28% of the data falling in class label 1, 11.99% data in…
SHASHANK GUPTA
  • 3,855
  • 4
  • 20
  • 26
59
votes
4 answers

What is the difference between bootstrapping and cross-validation?

I used to apply K-fold cross-validation for robust evaluation of my machine learning models. But I'm aware of the existence of the bootstrapping method for this purpose as well. However, I cannot see the main difference between them in terms of…
Fredrik
  • 1,047
  • 3
  • 10
  • 12
46
votes
10 answers

When is precision more important over recall?

Can anyone give me some examples where precision is important and some examples where recall is important?
Rajat
  • 1,167
  • 2
  • 10
  • 10
22
votes
4 answers

Train/Test Split after performing SMOTE

I am dealing with a highly unbalanced dataset so I used SMOTE to resample it. After SMOTE resampling, I split the resampled dataset into training/test sets using the training set to build a model and the test set to evaluate it. However, I am…
Edamame
  • 2,785
  • 5
  • 25
  • 34
20
votes
4 answers

Macro- or micro-average for imbalanced class problems

The question of whether to use macro- or micro-averages when the data is imbalanced comes up all the time. Some googling shows that many bloggers tend to say that micro-average is the preferred way to go, e.g.: Micro-average is preferable if there…
Krrr
  • 303
  • 1
  • 2
  • 6
16
votes
1 answer

How many features to sample using Random Forests

The Wikipedia page which quotes "The Elements of Statistical Learning" says: Typically, for a classification problem with $p$ features, $\lfloor \sqrt{p}\rfloor$ features are used in each split. I understand that this is a fairly good educated…
16
votes
1 answer

How to define a custom performance metric in Keras?

I tried to define a custom metric fuction (F1-Score) in Keras (Tensorflow backend) according to the following: def f1_score(tags, predicted): tags = set(tags) predicted = set(predicted) tp = len(tags & predicted) fp =…
Hendrik
  • 8,767
  • 17
  • 43
  • 55
15
votes
1 answer

Irregular Precision-Recall Curve

I'd expect that for a precision-recall curve, precision decreases while recall increases monotonically. I have a plot that is not smooth and looks funny. I used scikit learn the values for plotting the curve. Is the curve below abnormal? If yes, why…
Anderlecht
  • 271
  • 2
  • 7
15
votes
3 answers

What are the disadvantages of accuracy?

I have been reading about evaluating a model with accuracy only and I have found some disadvantages. Among them, I read that it equates all errors. How could this problem be solved? Maybe assigning costs to each type of failure? Thank you very much…
13
votes
3 answers

Why is the F-measure preferred for classification tasks?

Why is the F-measure usually used for (supervised) classification tasks, whereas the G-measure (or Fowlkes–Mallows index) is generally used for (unsupervised) clustering tasks? The F-measure is the harmonic mean of the precision and recall. The…
Bruno Lubascher
  • 3,618
  • 1
  • 14
  • 36
12
votes
2 answers

Neural Networks - Loss and Accuracy correlation

I'm a bit confused by the coexistence of Loss and Accuracy metrics in Neural Networks. Both are supposed to render the "exactness" of the comparison of $y$ and $\hat{y}$, aren't they? So isn't the application of the two redundant in the training…
Hendrik
  • 8,767
  • 17
  • 43
  • 55
10
votes
3 answers

Chi-square as evaluation metrics for nonlinear machine learning regression models

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks…
Alex
  • 201
  • 1
  • 3
9
votes
1 answer

When do I have to use aucPR instead of auROC? (and vice versa)

I'm wondering if sometimes, to validate a model, it's not better to use aucPR instead of aucROC? Do these cases only depend on the "domain & business understanding" ? Especially, I'm thinking about the "unbalanced class problem" where, it seems…
9
votes
2 answers

Difference between using RMSE and nDCG to evaluate Recommender Systems

What kind of error measures do RMSE and nDCG give while evaluating a recommender system, and how do I know when to use one over the other? If you could give an example of when to use each, that would be great as well!
9
votes
3 answers

How do you evaluate ML model already deployed in production?

so to be more clear lets consider the problem of loan default prediction. Let's say I have trained and tested off-line multiple classifiers and ensembled them. Then I gave this model to production. But because people change, data and many other…
tomtom
  • 247
  • 3
  • 5
1
2 3
24 25