Calibration Curves and Reliability Diagrams
A calibration curve, also known as a reliability diagram, is a visual tool that helps you assess how well a probabilistic classifier's predicted probabilities match the true likelihood of outcomes. To construct a calibration curve, you first split the predicted probabilities from your model into bins (for example, 0.0 β 0.1, 0.1 β 0.2, etc.). For each bin, you calculate the average predicted probability and the actual fraction of positive cases. You then plot these values: the x-axis shows the average predicted probability in each bin, and the y-axis shows the observed frequency of positive outcomes. If your model is perfectly calibrated, the points will fall along the diagonal line y=x β meaning that when your model predicts a probability of 0.7, about 70% of those cases are actually positive.
123456789101112131415161718192021222324252627import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt # Create synthetic binary classification data X, y = make_classification(n_samples=500, n_features=4, random_state=42) # Fit a simple classifier clf = LogisticRegression() clf.fit(X, y) probs = clf.predict_proba(X)[:, 1] # Compute calibration curve fraction_of_positives, mean_predicted_value = calibration_curve(y, probs, n_bins=10, strategy='uniform') # Plot reliability diagram plt.figure(figsize=(6, 6)) plt.plot(mean_predicted_value, fraction_of_positives, "o-", label="Model output") plt.plot([0, 1], [0, 1], "--", color="gray", label="Perfectly calibrated") plt.xlabel("Mean predicted probability") plt.ylabel("Fraction of positives") plt.title("Reliability Diagram (Calibration Curve)") plt.legend() plt.grid() plt.show()
When you look at the plotted reliability diagram, the diagonal line represents perfect calibration: every predicted probability matches the actual observed frequency. If your model's curve closely follows this diagonal, your probability estimates are reliable. However, deviations from the diagonal reveal issues. If the curve is above the diagonal, your model is underconfident β it predicts lower probabilities than the actual frequency. If the curve is below the diagonal, your model is overconfident β it predicts higher probabilities than the true outcome rate. The shape and direction of these deviations help you understand whether your model's probabilities can be trusted or if calibration techniques are needed.
One common pitfall when reading reliability diagrams is to overlook the number of samples in each bin. If some bins have very few samples, the observed frequency can be noisy and misleading. Always check the sample distribution across bins or use confidence intervals if possible.
1. What would a perfectly calibrated model's reliability diagram look like?
2. What does it mean if the reliability diagram curve is consistently below the diagonal?
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 6.67
Calibration Curves and Reliability Diagrams
Swipe to show menu
A calibration curve, also known as a reliability diagram, is a visual tool that helps you assess how well a probabilistic classifier's predicted probabilities match the true likelihood of outcomes. To construct a calibration curve, you first split the predicted probabilities from your model into bins (for example, 0.0 β 0.1, 0.1 β 0.2, etc.). For each bin, you calculate the average predicted probability and the actual fraction of positive cases. You then plot these values: the x-axis shows the average predicted probability in each bin, and the y-axis shows the observed frequency of positive outcomes. If your model is perfectly calibrated, the points will fall along the diagonal line y=x β meaning that when your model predicts a probability of 0.7, about 70% of those cases are actually positive.
123456789101112131415161718192021222324252627import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt # Create synthetic binary classification data X, y = make_classification(n_samples=500, n_features=4, random_state=42) # Fit a simple classifier clf = LogisticRegression() clf.fit(X, y) probs = clf.predict_proba(X)[:, 1] # Compute calibration curve fraction_of_positives, mean_predicted_value = calibration_curve(y, probs, n_bins=10, strategy='uniform') # Plot reliability diagram plt.figure(figsize=(6, 6)) plt.plot(mean_predicted_value, fraction_of_positives, "o-", label="Model output") plt.plot([0, 1], [0, 1], "--", color="gray", label="Perfectly calibrated") plt.xlabel("Mean predicted probability") plt.ylabel("Fraction of positives") plt.title("Reliability Diagram (Calibration Curve)") plt.legend() plt.grid() plt.show()
When you look at the plotted reliability diagram, the diagonal line represents perfect calibration: every predicted probability matches the actual observed frequency. If your model's curve closely follows this diagonal, your probability estimates are reliable. However, deviations from the diagonal reveal issues. If the curve is above the diagonal, your model is underconfident β it predicts lower probabilities than the actual frequency. If the curve is below the diagonal, your model is overconfident β it predicts higher probabilities than the true outcome rate. The shape and direction of these deviations help you understand whether your model's probabilities can be trusted or if calibration techniques are needed.
One common pitfall when reading reliability diagrams is to overlook the number of samples in each bin. If some bins have very few samples, the observed frequency can be noisy and misleading. Always check the sample distribution across bins or use confidence intervals if possible.
1. What would a perfectly calibrated model's reliability diagram look like?
2. What does it mean if the reliability diagram curve is consistently below the diagonal?
Thanks for your feedback!