Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Combining Calibration with Threshold Tuning | Applied Calibration Workflows
Model Calibration with Python

bookCombining Calibration with Threshold Tuning

When you use a machine learning model for binary classification, the model often produces a probability score for each instance. However, these probabilities are not always well-calibrated, meaning they might not reflect the true likelihood of an event. Calibration methods like Platt scaling or isotonic regression adjust these outputs so that, for example, a prediction of 0.8 really means there is an 80% chance of the positive class. Once you have calibrated probabilities, you can more confidently choose a threshold to convert those probabilities into class predictions. This is because the threshold now corresponds to an interpretable probability, making it easier to align decisions with business needs or optimize for specific metrics.

1234567891011121314151617181920212223242526272829303132333435
from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.calibration import CalibratedClassifierCV from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score import numpy as np # Generate synthetic data X, y = make_classification(n_samples=2000, n_features=20, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42) # Train a random forest classifier clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # Calibrate classifier using Platt scaling (sigmoid) calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv=3) calibrated_clf.fit(X_train, y_train) # Get calibrated probabilities on test set y_proba_calibrated = calibrated_clf.predict_proba(X_test)[:, 1] # Tune threshold to maximize F1-score thresholds = np.linspace(0.0, 1.0, 101) f1_scores = [] for thresh in thresholds: y_pred = (y_proba_calibrated >= thresh).astype(int) f1_scores.append(f1_score(y_test, y_pred)) best_idx = np.argmax(f1_scores) best_threshold = thresholds[best_idx] best_f1 = f1_scores[best_idx] print(f"Best threshold: {best_threshold:.2f}") print(f"Best F1-score: {best_f1:.3f}")
copy

Using calibrated probabilities for threshold selection has several advantages. When the probabilities are well-calibrated, the threshold you select directly corresponds to a true likelihood. This means that setting a threshold at 0.7, for example, implies you will only predict the positive class when the model is at least 70% confident, and this confidence is meaningful. This approach leads to more consistent and interpretable decision-making, and it allows you to tune thresholds to optimize metrics like F1-score, precision, or recall in a way that accurately reflects the underlying data distribution. Calibration also helps when the costs of false positives and false negatives differ, as you can set the threshold to match your risk preferences with greater trust.

1. Why is threshold tuning more reliable after probability calibration?

2. What could go wrong if you tune decision thresholds on uncalibrated probabilities?

question mark

Why is threshold tuning more reliable after probability calibration?

Select the correct answer

question mark

What could go wrong if you tune decision thresholds on uncalibrated probabilities?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 4

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

bookCombining Calibration with Threshold Tuning

Sveip for å vise menyen

When you use a machine learning model for binary classification, the model often produces a probability score for each instance. However, these probabilities are not always well-calibrated, meaning they might not reflect the true likelihood of an event. Calibration methods like Platt scaling or isotonic regression adjust these outputs so that, for example, a prediction of 0.8 really means there is an 80% chance of the positive class. Once you have calibrated probabilities, you can more confidently choose a threshold to convert those probabilities into class predictions. This is because the threshold now corresponds to an interpretable probability, making it easier to align decisions with business needs or optimize for specific metrics.

1234567891011121314151617181920212223242526272829303132333435
from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.calibration import CalibratedClassifierCV from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score import numpy as np # Generate synthetic data X, y = make_classification(n_samples=2000, n_features=20, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42) # Train a random forest classifier clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # Calibrate classifier using Platt scaling (sigmoid) calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv=3) calibrated_clf.fit(X_train, y_train) # Get calibrated probabilities on test set y_proba_calibrated = calibrated_clf.predict_proba(X_test)[:, 1] # Tune threshold to maximize F1-score thresholds = np.linspace(0.0, 1.0, 101) f1_scores = [] for thresh in thresholds: y_pred = (y_proba_calibrated >= thresh).astype(int) f1_scores.append(f1_score(y_test, y_pred)) best_idx = np.argmax(f1_scores) best_threshold = thresholds[best_idx] best_f1 = f1_scores[best_idx] print(f"Best threshold: {best_threshold:.2f}") print(f"Best F1-score: {best_f1:.3f}")
copy

Using calibrated probabilities for threshold selection has several advantages. When the probabilities are well-calibrated, the threshold you select directly corresponds to a true likelihood. This means that setting a threshold at 0.7, for example, implies you will only predict the positive class when the model is at least 70% confident, and this confidence is meaningful. This approach leads to more consistent and interpretable decision-making, and it allows you to tune thresholds to optimize metrics like F1-score, precision, or recall in a way that accurately reflects the underlying data distribution. Calibration also helps when the costs of false positives and false negatives differ, as you can set the threshold to match your risk preferences with greater trust.

1. Why is threshold tuning more reliable after probability calibration?

2. What could go wrong if you tune decision thresholds on uncalibrated probabilities?

question mark

Why is threshold tuning more reliable after probability calibration?

Select the correct answer

question mark

What could go wrong if you tune decision thresholds on uncalibrated probabilities?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 4
some-alt