Impara Isotonic Regression for Calibration | Calibration Methods in Practice

Isotonic regression is a non-parametric calibration method that transforms the predicted probabilities of a classifier into calibrated probabilities by learning a monotonically increasing function. Unlike Platt scaling, which fits a logistic regression model (a specific S-shaped curve) to map uncalibrated scores to probabilities, isotonic regression does not assume any particular functional form. Instead, it fits a piecewise constant function that only requires the mapping to be non-decreasing. This flexibility allows isotonic regression to adapt to a wider range of calibration issues, making it especially useful when the relationship between predicted scores and true probabilities is not well described by a logistic curve.

The main advantage of isotonic regression over Platt scaling is its ability to capture more complex, non-linear relationships between the model's raw outputs and the true likelihood of an event. This is particularly beneficial when the classifier's output deviates from the logistic shape, as isotonic regression can adjust to local patterns and irregularities in the data without being restricted by a fixed equation. However, this increased flexibility also means that isotonic regression may be more sensitive to noise, especially when the calibration set is small.


              123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.calibration import CalibratedClassifierCV, calibration_curve

# Generate synthetic data
X, y = make_classification(
    n_samples=2000,
    n_features=20,
    n_informative=10,
    class_sep=1.0,
    flip_y=0.05,
    random_state=42
)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=42
)

# Base model (typically overconfident)
gb = GradientBoostingClassifier(
    random_state=42,
    learning_rate=0.2,
    n_estimators=30,
    max_depth=3
)
gb.fit(X_train, y_train)

# Calibrated model with isotonic regression (internal CV on train)
calibrated_gb = CalibratedClassifierCV(
    estimator=gb,
    method="isotonic",
    cv=3
)
calibrated_gb.fit(X_train, y_train)

# Predicted probabilities on test set
prob_pos_uncal = gb.predict_proba(X_test)[:, 1]
prob_pos_iso = calibrated_gb.predict_proba(X_test)[:, 1]

# Compute calibration curves
frac_pos_uncal, mean_pred_uncal = calibration_curve(
    y_test, prob_pos_uncal, n_bins=10
)
frac_pos_iso, mean_pred_iso = calibration_curve(
    y_test, prob_pos_iso, n_bins=10
)

# Plot calibration curves
plt.figure(figsize=(8, 6))
plt.plot(mean_pred_uncal, frac_pos_uncal, "s-", label="Uncalibrated")
plt.plot(mean_pred_iso, frac_pos_iso, "s-", label="Isotonic Regression")
plt.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")

plt.xlabel("Mean predicted value")
plt.ylabel("Fraction of positives")
plt.title("Calibration Curve: Isotonic Regression vs. Uncalibrated")
plt.legend()
plt.tight_layout()
plt.show()

When you apply isotonic regression to calibrate a classifier, you often see a noticeable change to the shape of the calibration curve. Because isotonic regression fits a piecewise constant, monotonically increasing function, it can bend and adjust the curve at multiple points, correcting for both overconfidence and underconfidence in various regions of predicted probability. The resulting calibration curve may have steps or flat regions, especially if there are few samples in some probability intervals. This flexibility allows the calibration curve to closely follow the empirical relationship between predicted probabilities and observed frequencies, often resulting in a curve that hugs the diagonal more tightly than Platt scaling when the underlying relationship is not logistic.

1. What makes isotonic regression more flexible than Platt scaling?

2. When might isotonic regression overfit?

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 2. Capitolo 2

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Scorri per mostrare il menu


              123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.calibration import CalibratedClassifierCV, calibration_curve

# Generate synthetic data
X, y = make_classification(
    n_samples=2000,
    n_features=20,
    n_informative=10,
    class_sep=1.0,
    flip_y=0.05,
    random_state=42
)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=42
)

# Base model (typically overconfident)
gb = GradientBoostingClassifier(
    random_state=42,
    learning_rate=0.2,
    n_estimators=30,
    max_depth=3
)
gb.fit(X_train, y_train)

# Calibrated model with isotonic regression (internal CV on train)
calibrated_gb = CalibratedClassifierCV(
    estimator=gb,
    method="isotonic",
    cv=3
)
calibrated_gb.fit(X_train, y_train)

# Predicted probabilities on test set
prob_pos_uncal = gb.predict_proba(X_test)[:, 1]
prob_pos_iso = calibrated_gb.predict_proba(X_test)[:, 1]

# Compute calibration curves
frac_pos_uncal, mean_pred_uncal = calibration_curve(
    y_test, prob_pos_uncal, n_bins=10
)
frac_pos_iso, mean_pred_iso = calibration_curve(
    y_test, prob_pos_iso, n_bins=10
)

# Plot calibration curves
plt.figure(figsize=(8, 6))
plt.plot(mean_pred_uncal, frac_pos_uncal, "s-", label="Uncalibrated")
plt.plot(mean_pred_iso, frac_pos_iso, "s-", label="Isotonic Regression")
plt.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")

plt.xlabel("Mean predicted value")
plt.ylabel("Fraction of positives")
plt.title("Calibration Curve: Isotonic Regression vs. Uncalibrated")
plt.legend()
plt.tight_layout()
plt.show()

1. What makes isotonic regression more flexible than Platt scaling?

2. When might isotonic regression overfit?

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 2. Capitolo 2