Combining Calibration with Threshold Tuning
When you use a machine learning model for binary classification, the model often produces a probability score for each instance. However, these probabilities are not always well-calibrated, meaning they might not reflect the true likelihood of an event. Calibration methods like Platt scaling or isotonic regression adjust these outputs so that, for example, a prediction of 0.8 really means there is an 80% chance of the positive class. Once you have calibrated probabilities, you can more confidently choose a threshold to convert those probabilities into class predictions. This is because the threshold now corresponds to an interpretable probability, making it easier to align decisions with business needs or optimize for specific metrics.
1234567891011121314151617181920212223242526272829303132333435from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.calibration import CalibratedClassifierCV from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score import numpy as np # Generate synthetic data X, y = make_classification(n_samples=2000, n_features=20, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42) # Train a random forest classifier clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # Calibrate classifier using Platt scaling (sigmoid) calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv=3) calibrated_clf.fit(X_train, y_train) # Get calibrated probabilities on test set y_proba_calibrated = calibrated_clf.predict_proba(X_test)[:, 1] # Tune threshold to maximize F1-score thresholds = np.linspace(0.0, 1.0, 101) f1_scores = [] for thresh in thresholds: y_pred = (y_proba_calibrated >= thresh).astype(int) f1_scores.append(f1_score(y_test, y_pred)) best_idx = np.argmax(f1_scores) best_threshold = thresholds[best_idx] best_f1 = f1_scores[best_idx] print(f"Best threshold: {best_threshold:.2f}") print(f"Best F1-score: {best_f1:.3f}")
Using calibrated probabilities for threshold selection has several advantages. When the probabilities are well-calibrated, the threshold you select directly corresponds to a true likelihood. This means that setting a threshold at 0.7, for example, implies you will only predict the positive class when the model is at least 70% confident, and this confidence is meaningful. This approach leads to more consistent and interpretable decision-making, and it allows you to tune thresholds to optimize metrics like F1-score, precision, or recall in a way that accurately reflects the underlying data distribution. Calibration also helps when the costs of false positives and false negatives differ, as you can set the threshold to match your risk preferences with greater trust.
1. Why is threshold tuning more reliable after probability calibration?
2. What could go wrong if you tune decision thresholds on uncalibrated probabilities?
Merci pour vos commentaires !
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Can you explain the difference between Platt scaling and isotonic regression?
How do I choose which calibration method to use for my model?
What are some common pitfalls when calibrating probabilities?
Génial!
Completion taux amélioré à 6.67
Combining Calibration with Threshold Tuning
Glissez pour afficher le menu
When you use a machine learning model for binary classification, the model often produces a probability score for each instance. However, these probabilities are not always well-calibrated, meaning they might not reflect the true likelihood of an event. Calibration methods like Platt scaling or isotonic regression adjust these outputs so that, for example, a prediction of 0.8 really means there is an 80% chance of the positive class. Once you have calibrated probabilities, you can more confidently choose a threshold to convert those probabilities into class predictions. This is because the threshold now corresponds to an interpretable probability, making it easier to align decisions with business needs or optimize for specific metrics.
1234567891011121314151617181920212223242526272829303132333435from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.calibration import CalibratedClassifierCV from sklearn.model_selection import train_test_split from sklearn.metrics import f1_score import numpy as np # Generate synthetic data X, y = make_classification(n_samples=2000, n_features=20, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42) # Train a random forest classifier clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # Calibrate classifier using Platt scaling (sigmoid) calibrated_clf = CalibratedClassifierCV(clf, method='sigmoid', cv=3) calibrated_clf.fit(X_train, y_train) # Get calibrated probabilities on test set y_proba_calibrated = calibrated_clf.predict_proba(X_test)[:, 1] # Tune threshold to maximize F1-score thresholds = np.linspace(0.0, 1.0, 101) f1_scores = [] for thresh in thresholds: y_pred = (y_proba_calibrated >= thresh).astype(int) f1_scores.append(f1_score(y_test, y_pred)) best_idx = np.argmax(f1_scores) best_threshold = thresholds[best_idx] best_f1 = f1_scores[best_idx] print(f"Best threshold: {best_threshold:.2f}") print(f"Best F1-score: {best_f1:.3f}")
Using calibrated probabilities for threshold selection has several advantages. When the probabilities are well-calibrated, the threshold you select directly corresponds to a true likelihood. This means that setting a threshold at 0.7, for example, implies you will only predict the positive class when the model is at least 70% confident, and this confidence is meaningful. This approach leads to more consistent and interpretable decision-making, and it allows you to tune thresholds to optimize metrics like F1-score, precision, or recall in a way that accurately reflects the underlying data distribution. Calibration also helps when the costs of false positives and false negatives differ, as you can set the threshold to match your risk preferences with greater trust.
1. Why is threshold tuning more reliable after probability calibration?
2. What could go wrong if you tune decision thresholds on uncalibrated probabilities?
Merci pour vos commentaires !