Apprendre Combining Calibration with Threshold Tuning

When you use a machine learning model for binary classification, the model often produces a probability score for each instance. However, these probabilities are not always well-calibrated, meaning they might not reflect the true likelihood of an event. Calibration methods like Platt scaling or isotonic regression adjust these outputs so that, for example, a prediction of 0.8 really means there is an 80% chance of the positive class. Once you have calibrated probabilities, you can more confidently choose a threshold to convert those probabilities into class predictions. This is because the threshold now corresponds to an interpretable probability, making it easier to align decisions with business needs or optimize for specific metrics.


              1234567891011121314151617181920212223242526272829303132333435
            
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import numpy as np

# Generate synthetic data
X, y = make_classification(n_samples=2000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Train a random forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Calibrate classifier using Platt scaling (sigmoid)
calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv=3)
calibrated_clf.fit(X_train, y_train)

# Get calibrated probabilities on test set
y_proba_calibrated = calibrated_clf.predict_proba(X_test)[:, 1]

# Tune threshold to maximize F1-score
thresholds = np.linspace(0.0, 1.0, 101)
f1_scores = []
for thresh in thresholds:
    y_pred = (y_proba_calibrated >= thresh).astype(int)
    f1_scores.append(f1_score(y_test, y_pred))

best_idx = np.argmax(f1_scores)
best_threshold = thresholds[best_idx]
best_f1 = f1_scores[best_idx]

print(f"Best threshold: {best_threshold:.2f}")
print(f"Best F1-score: {best_f1:.3f}")

Using calibrated probabilities for threshold selection has several advantages. When the probabilities are well-calibrated, the threshold you select directly corresponds to a true likelihood. This means that setting a threshold at 0.7, for example, implies you will only predict the positive class when the model is at least 70% confident, and this confidence is meaningful. This approach leads to more consistent and interpretable decision-making, and it allows you to tune thresholds to optimize metrics like F1-score, precision, or recall in a way that accurately reflects the underlying data distribution. Calibration also helps when the costs of false positives and false negatives differ, as you can set the threshold to match your risk preferences with greater trust.

1. Why is threshold tuning more reliable after probability calibration?

2. What could go wrong if you tune decision thresholds on uncalibrated probabilities?

Tout était clair ?

Merci pour vos commentaires !

Section 3. Chapitre 4

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Glissez pour afficher le menu


              1234567891011121314151617181920212223242526272829303132333435
            
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import numpy as np

# Generate synthetic data
X, y = make_classification(n_samples=2000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Train a random forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Calibrate classifier using Platt scaling (sigmoid)
calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv=3)
calibrated_clf.fit(X_train, y_train)

# Get calibrated probabilities on test set
y_proba_calibrated = calibrated_clf.predict_proba(X_test)[:, 1]

# Tune threshold to maximize F1-score
thresholds = np.linspace(0.0, 1.0, 101)
f1_scores = []
for thresh in thresholds:
    y_pred = (y_proba_calibrated >= thresh).astype(int)
    f1_scores.append(f1_score(y_test, y_pred))

best_idx = np.argmax(f1_scores)
best_threshold = thresholds[best_idx]
best_f1 = f1_scores[best_idx]

print(f"Best threshold: {best_threshold:.2f}")
print(f"Best F1-score: {best_f1:.3f}")

1. Why is threshold tuning more reliable after probability calibration?

2. What could go wrong if you tune decision thresholds on uncalibrated probabilities?

Tout était clair ?

Merci pour vos commentaires !

Section 3. Chapitre 4