How do I compute and plot Bias and Variance of a classifier in Python?

Question

I'm new to Machine Learning and I understand bias and variance in theory but I can't seem to find a single source that explains how bias or variance can be computed. I'd like to do it in Python and possibly plot a graph or something.

My understanding of bias is that it is the average error for different training datasets while variance of an estimator tells you how sensitive (in terms of testing error) the estimator is, on varying training sets.

Suppose I have a classification problem and I perform a train-test split with test_size = 0.25 and fit a Logistic Model.

I'm aware that I can use the accuracy_score from sklearn.metrics to compute the difference between predicted target values and actual target values but I'm unable to connect the dots and somehow compute the bias-variance for my Logistic Model.

Any help is appreciated, thank you!

MuhammedYunus StopTheGenocide · Accepted Answer · 2024-06-10T10:33:14.120

Below I outline an informal definition used for assessing ML models (based on this book).

Compute the error on both the training set and validation set. The train set error can be interpreted as the model's bias. The variance is how much higher the validation error is from the training error.

$bias := train~set~error$

$variance := val~set~error-bias$

We don't expect a model to perform better on the validation set than on the training set, thus $variance\geq 0$ (a validation error curve will not cross or go below the training error curve).

Using these definitions, the various scenarios are:

Overfitting: low bias, high variance
- Often because the model is simply memorising the training set (low bias), and consequently failing to generalise to the validation set (high variance).
Underfitting: high bias
- Often this is because the model is too simple (high bias, low variance). It could also be complex, but the wrong type of model (high bias, high variance).
Adequately fitted (neither underfitting nor overfitting): low bias, low variance
- This is the desired operating point. The model is complex enough to model the data well (low bias), and in a manner that generalises to new data (low variance).

By "low" and "high", I mean relative to your target error rate. Having $variance>>bias$ might seem like overfitting because of the large train-validation gap, but I wouldn't call it overfitting if the validation score is nonetheless good and within spec. In other words, I am judging error rates relative to the desired error rate, rather than on simply the gap between the train and validation scores.

Example 1: using a single trained model (no CV)

Fitting a logistic regression model on a binary classification problem.

Results:

train loss: 0.173 | val loss: 0.176
bias:       0.173 | variance: 0.003 | variance:bias ratio is 0.017

The model is high-bias (train error rate of 17%) and low-variance (0.3%), characteristic of an underfitting model.

import numpy as np
#Data for testing
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=10_000, random_state=0)
#Split data. Just a train-validation set for this demo.
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=0, test_size=0.25, stratify=y)
#Fit model
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state=np.random.RandomState(0)).fit(X_train, y_train)

#Compute losses/error rates and report

use_brier_score = False
if use_brier_score:

    from sklearn.metrics import brier_score_loss
train_proba = model.predict_proba(X_train)[:, 1]
val_proba = model.predict_proba(X_val)[:, 1]

train_loss = brier_score_loss(y_train, train_proba)
val_loss = brier_score_loss(y_val, val_proba)

else:
    #Use accuracy, and calculate the error rate
    train_loss = 1 - model.score(X_train, y_train)
    val_loss = 1 - model.score(X_val, y_val)
#To bias and variance
bias = train_loss
variance = val_loss - bias
print('train loss: %.3f' % train_loss, '| val loss: %.3f' % val_loss)
print('bias: %.3f' % bias, '| variance: %.3f' % variance, end=' ')
print('| variance:bias ratio is %.3f' % (variance / bias))

Example 2: CV

Running 5-fold stratified CV. So rather than evaluating a single pre-fitted model, we fit models on 5 different splits and average the results.

Accuracy (%)
  trn: 82.7 
  val: 82.6
Error rate (%)
  trn: 17.28 
  val: 17.40
bias: 17.28 | variance: 0.12 | variance:bias ratio=0.007

import numpy as np

#Data for testing
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=10_000, random_state=0)

#Split off a test set (not used)
from sklearn.model_selection import train_test_split
X_cv, X_test, y_cv, y_test = train_test_split(X, y, random_state=0, test_size=0.25, stratify=y)

#Split data. Just a train-validation set for this demo.
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LogisticRegression

#Uses 5-fold stratified CV and "accuracy" (default for binary/multiclass y)
np.random.seed(0)
cv_results = cross_validate(
    LogisticRegression(), X_cv, y_cv,
    return_train_score=True,
    # scoring='accuracy',
    # cv=5,
)

train_acc = cv_results['train_score'].mean() * 100
val_acc = cv_results['test_score'].mean() * 100
#Could also extract std, confidence intervals, median/IQR, etc

#
#Compute error rates and report
#
train_error = 100 - train_acc
val_error = 100 - val_acc

#To bias and variance
bias = train_error
variance = val_error - bias

print('Accuracy (%)')
print('  trn: %.1f' % train_acc, '\n  val: %.1f' % val_acc)

print('\nError rate (%)')
print('  trn: %.2f' % train_error, '\n  val: %.2f' % val_error)

print(
    '\nbias: %.2f' % bias, '| variance: %.2f' % variance,
    '| variance:bias ratio=%.3f' % (variance / bias)
)
```

How do I compute and plot Bias and Variance of a classifier in Python?

1 Answers1

Example 1: using a single trained model (no CV)

Example 2: CV

Linked