Imbalanced dataset - Positive majority class

Question

My dataset consists of 150 patients where 50 are controls/healthy (negative) and 100 are sick (positive). If I want my model to have high sensitivity at hight specificity, in other words to have low false positive rates, should I correct my model by applying weights to it? Because usually the positive class is the minority class and I see why you need to correct for it but should I in my case?

Thanks

methus · Answer 1 · 2020-02-10T09:54:09.797

There are two things that might help you:

1) If you use a classifier $f$ that returns a value $f(x)\in[0,1]$ between 0 and 1 instead of a direct class assignment, then you can use a threshold $\theta\in[0,1]$:

$f(x)=\begin{cases} 1,& \text{if } f(x)\geq\theta\\ 0, & \text{else } \end{cases}$

The higher the value of $\theta$, the more you want to be sure to classify somebody as positive, which means that there will be fewer false positives, but at the same time you will have a fewer sensitivity (recall) because there will be more false negatives.

2) if your classifier always predicts the majority class, you could try oversample the minority class (using SMOTE or other oversampling techniques) to make the training data set balanced.

Imbalanced dataset - Positive majority class

1 Answers1