Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Calibration of Tree-Based Models and SVMs | Calibration Methods in Practice
Quizzes & Challenges
Quizzes
Challenges
/
Model Calibration with Python

bookCalibration of Tree-Based Models and SVMs

Tree-based models like random forests and support vector machines (SVMs) are widely used for classification tasks, but their probability outputs are often poorly calibrated. This means the predicted probabilities do not always reflect the true likelihood of an event. For instance, a random forest might output a probability of 0.9 for a class, but in reality, only 70% of such predictions are correct. SVMs, especially when used with default settings, do not produce probability estimates at all—they output decision scores that must be converted into probabilities. These characteristics create challenges when you need well-calibrated probabilities for downstream tasks, such as risk assessment or decision making. Calibrating these models is especially important if you want your model's confidence to be trustworthy and actionable.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn.calibration import CalibratedClassifierCV, calibration_curve from sklearn.model_selection import train_test_split # Generate synthetic data X, y = make_classification(n_samples=2000, n_features=20, n_informative=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42) # Fit uncalibrated models rf = RandomForestClassifier(n_estimators=100, random_state=42) svc = SVC(probability=True, random_state=42) rf.fit(X_train, y_train) svc.fit(X_train, y_train) # Calibrate with sigmoid (Platt scaling) rf_calibrated = CalibratedClassifierCV(rf, method='sigmoid', cv=5) svc_calibrated = CalibratedClassifierCV(svc, method='sigmoid', cv=5) rf_calibrated.fit(X_train, y_train) svc_calibrated.fit(X_train, y_train) # Get predicted probabilities probs_rf = rf.predict_proba(X_test)[:, 1] probs_rf_cal = rf_calibrated.predict_proba(X_test)[:, 1] probs_svc = svc.predict_proba(X_test)[:, 1] probs_svc_cal = svc_calibrated.predict_proba(X_test)[:, 1] # Reliability diagrams fig, axes = plt.subplots(1, 2, figsize=(12, 5)) for ax, probs, probs_cal, name in zip( axes, [probs_rf, probs_svc], [probs_rf_cal, probs_svc_cal], ["Random Forest", "SVC"] ): frac_pos, mean_pred = calibration_curve(y_test, probs, n_bins=10) frac_pos_cal, mean_pred_cal = calibration_curve(y_test, probs_cal, n_bins=10) ax.plot(mean_pred, frac_pos, "s-", label="Uncalibrated") ax.plot(mean_pred_cal, frac_pos_cal, "o-", label="Calibrated") ax.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated") ax.set_title(f"{name} Reliability Diagram") ax.set_xlabel("Mean predicted probability") ax.set_ylabel("Fraction of positives") ax.legend() plt.tight_layout() plt.show()
copy

Tree-based models, such as random forests, often show overconfident probability estimates because their predictions are based on the majority vote of decision trees, which can be biased toward extreme probabilities. SVMs, on the other hand, are not inherently probabilistic and require additional steps to map their decision scores to probabilities. When you calibrate these models using techniques like Platt scaling (sigmoid) or isotonic regression, you often find a significant improvement in the alignment between predicted probabilities and observed outcomes. However, the effectiveness of calibration can depend on the underlying data and the model's complexity. In practice, SVMs almost always require calibration for meaningful probability outputs, while tree-based models may also benefit, especially when used in risk-sensitive applications.

1. Why do SVMs often produce uncalibrated probabilities?

2. Which calibration method is commonly used for tree-based models?

question mark

Why do SVMs often produce uncalibrated probabilities?

Select the correct answer

question mark

Which calibration method is commonly used for tree-based models?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 5

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain what a reliability diagram shows and how to interpret it?

What is the difference between Platt scaling and isotonic regression for calibration?

How do I know if my model's probabilities are well-calibrated?

bookCalibration of Tree-Based Models and SVMs

Свайпніть щоб показати меню

Tree-based models like random forests and support vector machines (SVMs) are widely used for classification tasks, but their probability outputs are often poorly calibrated. This means the predicted probabilities do not always reflect the true likelihood of an event. For instance, a random forest might output a probability of 0.9 for a class, but in reality, only 70% of such predictions are correct. SVMs, especially when used with default settings, do not produce probability estimates at all—they output decision scores that must be converted into probabilities. These characteristics create challenges when you need well-calibrated probabilities for downstream tasks, such as risk assessment or decision making. Calibrating these models is especially important if you want your model's confidence to be trustworthy and actionable.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVC from sklearn.calibration import CalibratedClassifierCV, calibration_curve from sklearn.model_selection import train_test_split # Generate synthetic data X, y = make_classification(n_samples=2000, n_features=20, n_informative=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42) # Fit uncalibrated models rf = RandomForestClassifier(n_estimators=100, random_state=42) svc = SVC(probability=True, random_state=42) rf.fit(X_train, y_train) svc.fit(X_train, y_train) # Calibrate with sigmoid (Platt scaling) rf_calibrated = CalibratedClassifierCV(rf, method='sigmoid', cv=5) svc_calibrated = CalibratedClassifierCV(svc, method='sigmoid', cv=5) rf_calibrated.fit(X_train, y_train) svc_calibrated.fit(X_train, y_train) # Get predicted probabilities probs_rf = rf.predict_proba(X_test)[:, 1] probs_rf_cal = rf_calibrated.predict_proba(X_test)[:, 1] probs_svc = svc.predict_proba(X_test)[:, 1] probs_svc_cal = svc_calibrated.predict_proba(X_test)[:, 1] # Reliability diagrams fig, axes = plt.subplots(1, 2, figsize=(12, 5)) for ax, probs, probs_cal, name in zip( axes, [probs_rf, probs_svc], [probs_rf_cal, probs_svc_cal], ["Random Forest", "SVC"] ): frac_pos, mean_pred = calibration_curve(y_test, probs, n_bins=10) frac_pos_cal, mean_pred_cal = calibration_curve(y_test, probs_cal, n_bins=10) ax.plot(mean_pred, frac_pos, "s-", label="Uncalibrated") ax.plot(mean_pred_cal, frac_pos_cal, "o-", label="Calibrated") ax.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated") ax.set_title(f"{name} Reliability Diagram") ax.set_xlabel("Mean predicted probability") ax.set_ylabel("Fraction of positives") ax.legend() plt.tight_layout() plt.show()
copy

Tree-based models, such as random forests, often show overconfident probability estimates because their predictions are based on the majority vote of decision trees, which can be biased toward extreme probabilities. SVMs, on the other hand, are not inherently probabilistic and require additional steps to map their decision scores to probabilities. When you calibrate these models using techniques like Platt scaling (sigmoid) or isotonic regression, you often find a significant improvement in the alignment between predicted probabilities and observed outcomes. However, the effectiveness of calibration can depend on the underlying data and the model's complexity. In practice, SVMs almost always require calibration for meaningful probability outputs, while tree-based models may also benefit, especially when used in risk-sensitive applications.

1. Why do SVMs often produce uncalibrated probabilities?

2. Which calibration method is commonly used for tree-based models?

question mark

Why do SVMs often produce uncalibrated probabilities?

Select the correct answer

question mark

Which calibration method is commonly used for tree-based models?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 5
some-alt