Highest Voted 'class-imbalance' Questions - Data Science Stack Exchange

59

votes

6 answers

Should I go for a 'balanced' dataset or a 'representative' dataset?

My 'machine learning' task is of separating benign Internet traffic from malicious traffic. In the real world scenario, most (say 90% or more) of Internet traffic is benign. Thus I felt that I should choose a similar data setup for training my…

machine-learning dataset class-imbalance

asked Jul 22 '14 at 12:29

pnp

693
1
6
10

42

votes

6 answers

Unbalanced multiclass data with XGBoost

I have 3 classes with this distribution: Class 0: 0.1169 Class 1: 0.7668 Class 2: 0.1163 And I am using xgboost for classification. I know that there is a parameter called scale_pos_weight. But how is it handled for 'multiclass' case, and how can…

classification xgboost multiclass-classification class-imbalance

asked Jan 16 '17 at 12:53

shda

585
1
5
10

35

votes

4 answers

Quick guide into training highly imbalanced data sets

I have a classification problem with approximately 1000 positive and 10000 negative samples in training set. So this data set is quite unbalanced. Plain random forest is just trying to mark all test samples as a majority class. Some good answers…

machine-learning classification dataset class-imbalance

asked Sep 12 '14 at 15:20

IgorS

5,474
11
34
43

31

votes

4 answers

macro average and weighted average meaning in classification_report

I use the "classification_report" from from sklearn.metrics import classification_report in order to evaluate the imbalanced binary classification Classification Report : precision recall f1-score support 0 1.00…

classification accuracy class-imbalance

asked Jan 04 '20 at 10:38

user10296606

1,906
6
18
33

25

votes

3 answers

How do you apply SMOTE on text classification?

Synthetic Minority Oversampling Technique (SMOTE) is an oversampling technique used in an imbalanced dataset problem. So far I have an idea how to apply it on generic, structured data. But is it possible to apply it on text classification problem?…

class-imbalance text smote

asked Feb 10 '18 at 11:18

catris25

369
1
3
5

22

votes

4 answers

Train/Test Split after performing SMOTE

I am dealing with a highly unbalanced dataset so I used SMOTE to resample it. After SMOTE resampling, I split the resampled dataset into training/test sets using the training set to build a model and the test set to evaluate it. However, I am…

machine-learning model-evaluations class-imbalance smote

asked Dec 09 '16 at 00:19

Edamame

2,785
5
25
34

20

votes

4 answers

Macro- or micro-average for imbalanced class problems

The question of whether to use macro- or micro-averages when the data is imbalanced comes up all the time. Some googling shows that many bloggers tend to say that micro-average is the preferred way to go, e.g.: Micro-average is preferable if there…

machine-learning model-evaluations class-imbalance

asked Aug 13 '18 at 09:57

Krrr

303
1
2
6

19

votes

2 answers

Why does data science see class imbalance as a problem for supervised learning when statistics does not?

Why does data science see class imbalance as a problem in supervised learning when statistics says it is not? Data science seems to seem class imbalance as problematic and needing special techniques to remedy this problem. For instance, this DS.SE…

machine-learning classification statistics class-imbalance supervised-learning

asked Jan 09 '24 at 18:34

Dave

4,542
1
10
35

18

votes

3 answers

When should we consider a dataset as imbalanced?

I'm facing a situation where the numbers of positive and negative examples in a dataset are imbalanced. My question is, are there any rules of thumb that tell us when we should subsample the large category in order to force some kind of balancing in…

classification dataset sampling class-imbalance

asked May 16 '16 at 11:36

Rami

604
2
6
16

16

votes

4 answers

What are the implications for training a Tree Ensemble with highly biased datasets?

I have a highly biased binary dataset - I have 1000x more examples of the negative class than the positive class. I would like to train a Tree Ensemble (like Extra Random Trees or a Random Forest) on this data but it's difficult to create training…

machine-learning feature-selection class-imbalance

asked Jun 18 '14 at 15:48

gallamine

428
3
8

16

votes

4 answers

Why SMOTE is not used in prize-winning Kaggle solutions?

Synthetic Minority Over-sampling Technique SMOTE, is a well known method to tackle imbalanced datasets. There are many papers with a lot of citations out-there claiming that it is used to boost accuracy in unbalanced data scenarios. But then, when I…

machine-learning class-imbalance kaggle smote

asked Dec 27 '21 at 16:50

Carlos Mougan

6,430
2
20
51

15

votes

3 answers

How can I perform stratified sampling for multi-label multi-class classification?

I am asking this question for few reasons: The dataset in hand is imbalanced I used below code x = dataset[['Message']] y = dataset[['Label1', 'Label2']] train_data, test_data = train_test_split(x, test_size = 0.1, stratify=y, random_state =…

classification data-mining data-cleaning class-imbalance

asked Jun 13 '18 at 11:18

Divyanshu Shekhar

587
1
5
15

15

votes

2 answers

Why do we need to handle data imbalance?

I would like to know why we need to deal with data imbalance. I know how to deal with it and different methods to solve the issue - by up sampling or down sampling or by using SMOTE. For example, if I have a rare disease 1 percent out of 100, and…

classification dataset sampling class-imbalance

asked Nov 06 '17 at 06:15

sara

481
7
15

14

votes

1 answer

Why doesn't class weight resolve the imbalanced classification problem?

I know that in imbalanced classification, the classifier tends to predict all the test labels as larger class label, but if we use class weight in loss function, it would be reasonable to expect the problem to be solved. So why we need some…

classification class-imbalance weighted-data

asked Jan 29 '19 at 07:21

user137927

389
1
3
11

13

votes

3 answers

Unbalanced classes -- How to minimize false negatives?

I have a dataset that has a binary class attribute. There are 623 instances with class +1 (cancer positive) and 101,671 instances with class -1 (cancer negative). I've tried various algorithms (Naive Bayes, Random Forest, AODE, C4.5) and all of them…

classification random-forest decision-trees class-imbalance

asked Nov 12 '15 at 16:09

user798275

293
2
3
5

Questions tagged [class-imbalance]