Questions tagged [binary-classification]

145 questions
8
votes
1 answer

How to choose the right threshold for binary classification?

I am currently working on the titanic dataset from Kaggle. The data set is imbalanced with almost 61.5 % negative and 38.5 positive class. I divided my training dataset into 85% train and 15% validation set. I chose a support vector classifier as…
7
votes
2 answers

Binary classification on small dataset

beginner of ML here. Can anyone tell me if it is advisable to apply ML models, specifically binary classification and using Pycaret on a dataset with 69 columns and 226 rows? it has columns for different features with measurements, 1 column that…
4
votes
2 answers

The accuracy of a random forest algorithm is nearly 1, how do I solve this problem? (with updates)

I have a problem with a random forest algorithm, I'm firstly explaining the situation and then I'll ask questions. I have a dataset of 10000 raws x 40 columns, 39 of them are features and 1 contains the labels. The 10000 raws represent different…
4
votes
1 answer

Does a random classifier have a diagonal ROC (received operator characteristic) curve even when the data is biased toward negatives?

About 9% of the US population have a diabetes diagnosis. So a binary random classifier that just guesses 50% positive and 50% negative would likely be incorrect when it guesses positive (leading to more False Positives (FP) than True Positives…
joseville
  • 143
  • 3
4
votes
2 answers

ROC curve for a perfect model, why is AUC 1.0?

According to this article, The ROC curve for a perfect model would go straight up the TPR axis on the left and then across the FPR axis at the top. Since the plot area for the curve measures 1x1, the area under this perfect curve would be 1.0 Is…
T. Webster
  • 141
  • 1
4
votes
3 answers

Timing of applying random oversampling on the dataset

I tried to learn classification using machine learning algorithms. I went through Breast Cancer - EDA, Balancing and ML the notebook. In this notebook Random Oversampling had been implemented. However, when the person did the oversampling he did it…
4
votes
2 answers

Meaningfully compare target vs observed TPR & FPR

Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as: $$ \widehat{y} = \begin{cases} 1, & f(x) \geq t \\ 0, & f(x) < t \end{cases} $$ I then compute the $TPR$…
3
votes
2 answers

Much higher scoring metrics with classification_report than cross_validate

I'm training a classifier on the DAIGT dataset. The objective is to differentiate human from AI text and so this is a binary classification problem. As a baseline before I move onto an LLM classifier, I am using a pipeline of a TF-IDF vectorizer and…
3
votes
1 answer

Using keras metrics BinaryCrossentropy for a binary model

I'm trying to implement a binary classification model using tensorflow keras and stumbled over problem that I cannot grasp. My model shall classify images of houses in the two classes of "old/antique" and "new/modern". I used transfer learning using…
Ada
  • 33
  • 5
3
votes
2 answers

Is it better to train a CNN model with bad images or good images when the testing data will be in bad quality?

This is a very general question so lets take a very general example: imagine a CNN model that distinguishes between dogs and cats facial features images. we have two kinds of training data set: one with full features and one with defective. by…
hamflow
  • 133
  • 5
3
votes
1 answer

What does precision-recall curve and ROC curve tell us abouth threshold invariance

Consider a binary classification problem. Intuitively, a value for the area under the curve (for both curves) very close to 1, shows that the curve is almost L-shaped. Thus, this means that the value on y axis stays rather consistent despite changes…
3
votes
1 answer

How to combine binary classification with patient stratification?

I am working on a binary classification model (healthy/diseased) based on gene expression data of different patients. As a second task, I would like to stratify these patients and find subgroups. I expect that the summary pattern of different genes…
vhio
  • 31
  • 2
3
votes
3 answers

How are scores calculated for each class of binary classification

The formula for Precision is TP / TP + FP, but how to apply it individually for each class of a binary classification problem, For example here the precision, recall and f1 scores are calculated for class 0 and class 1 individually, I am not able…
3
votes
1 answer

How do you add negative class sample for binary classification?

How do you prepare the negative dataset for binary classification? Let us say that I am building a classifier that has to classify whether the input image is of a car or not. I already have a dataset that consists of thousands of cars. But what…
2
votes
1 answer

Level of confidence for binary classification

I’m relatively new to PyTorch and deep learning. I was able to create a model and analyze a data set for both a training and test set in a binary classification problem. Everything is working well. Now I’m setting up a program where a user can enter…
1
2 3
9 10