Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Calibration Curves and Reliability Diagrams | Foundations of Probabilistic Calibration
Model Calibration with Python

bookCalibration Curves and Reliability Diagrams

A calibration curve, also known as a reliability diagram, is a visual tool that helps you assess how well a probabilistic classifier's predicted probabilities match the true likelihood of outcomes. To construct a calibration curve, you first split the predicted probabilities from your model into bins (for example, 0.0 – 0.1, 0.1 – 0.2, etc.). For each bin, you calculate the average predicted probability and the actual fraction of positive cases. You then plot these values: the x-axis shows the average predicted probability in each bin, and the y-axis shows the observed frequency of positive outcomes. If your model is perfectly calibrated, the points will fall along the diagonal line y=xy = x β€” meaning that when your model predicts a probability of 0.7, about 70% of those cases are actually positive.

123456789101112131415161718192021222324252627
import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt # Create synthetic binary classification data X, y = make_classification(n_samples=500, n_features=4, random_state=42) # Fit a simple classifier clf = LogisticRegression() clf.fit(X, y) probs = clf.predict_proba(X)[:, 1] # Compute calibration curve fraction_of_positives, mean_predicted_value = calibration_curve(y, probs, n_bins=10, strategy='uniform') # Plot reliability diagram plt.figure(figsize=(6, 6)) plt.plot(mean_predicted_value, fraction_of_positives, "o-", label="Model output") plt.plot([0, 1], [0, 1], "--", color="gray", label="Perfectly calibrated") plt.xlabel("Mean predicted probability") plt.ylabel("Fraction of positives") plt.title("Reliability Diagram (Calibration Curve)") plt.legend() plt.grid() plt.show()
copy

When you look at the plotted reliability diagram, the diagonal line represents perfect calibration: every predicted probability matches the actual observed frequency. If your model's curve closely follows this diagonal, your probability estimates are reliable. However, deviations from the diagonal reveal issues. If the curve is above the diagonal, your model is underconfident β€” it predicts lower probabilities than the actual frequency. If the curve is below the diagonal, your model is overconfident β€” it predicts higher probabilities than the true outcome rate. The shape and direction of these deviations help you understand whether your model's probabilities can be trusted or if calibration techniques are needed.

Note
Note

One common pitfall when reading reliability diagrams is to overlook the number of samples in each bin. If some bins have very few samples, the observed frequency can be noisy and misleading. Always check the sample distribution across bins or use confidence intervals if possible.

1. What would a perfectly calibrated model's reliability diagram look like?

2. What does it mean if the reliability diagram curve is consistently below the diagonal?

question mark

What would a perfectly calibrated model's reliability diagram look like?

Select the correct answer

question mark

What does it mean if the reliability diagram curve is consistently below the diagonal?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookCalibration Curves and Reliability Diagrams

Swipe to show menu

A calibration curve, also known as a reliability diagram, is a visual tool that helps you assess how well a probabilistic classifier's predicted probabilities match the true likelihood of outcomes. To construct a calibration curve, you first split the predicted probabilities from your model into bins (for example, 0.0 – 0.1, 0.1 – 0.2, etc.). For each bin, you calculate the average predicted probability and the actual fraction of positive cases. You then plot these values: the x-axis shows the average predicted probability in each bin, and the y-axis shows the observed frequency of positive outcomes. If your model is perfectly calibrated, the points will fall along the diagonal line y=xy = x β€” meaning that when your model predicts a probability of 0.7, about 70% of those cases are actually positive.

123456789101112131415161718192021222324252627
import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt # Create synthetic binary classification data X, y = make_classification(n_samples=500, n_features=4, random_state=42) # Fit a simple classifier clf = LogisticRegression() clf.fit(X, y) probs = clf.predict_proba(X)[:, 1] # Compute calibration curve fraction_of_positives, mean_predicted_value = calibration_curve(y, probs, n_bins=10, strategy='uniform') # Plot reliability diagram plt.figure(figsize=(6, 6)) plt.plot(mean_predicted_value, fraction_of_positives, "o-", label="Model output") plt.plot([0, 1], [0, 1], "--", color="gray", label="Perfectly calibrated") plt.xlabel("Mean predicted probability") plt.ylabel("Fraction of positives") plt.title("Reliability Diagram (Calibration Curve)") plt.legend() plt.grid() plt.show()
copy

When you look at the plotted reliability diagram, the diagonal line represents perfect calibration: every predicted probability matches the actual observed frequency. If your model's curve closely follows this diagonal, your probability estimates are reliable. However, deviations from the diagonal reveal issues. If the curve is above the diagonal, your model is underconfident β€” it predicts lower probabilities than the actual frequency. If the curve is below the diagonal, your model is overconfident β€” it predicts higher probabilities than the true outcome rate. The shape and direction of these deviations help you understand whether your model's probabilities can be trusted or if calibration techniques are needed.

Note
Note

One common pitfall when reading reliability diagrams is to overlook the number of samples in each bin. If some bins have very few samples, the observed frequency can be noisy and misleading. Always check the sample distribution across bins or use confidence intervals if possible.

1. What would a perfectly calibrated model's reliability diagram look like?

2. What does it mean if the reliability diagram curve is consistently below the diagonal?

question mark

What would a perfectly calibrated model's reliability diagram look like?

Select the correct answer

question mark

What does it mean if the reliability diagram curve is consistently below the diagonal?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3
some-alt