Lære Bias–Variance Trade-Off and Ensembles | Introduction to Ensemble Learning

In machine learning, prediction error is composed of three main components: bias, variance, and irreducible error. Bias measures how far, on average, your model's predictions are from the actual values because of simplifying assumptions. High bias means the model is too simple to capture the underlying patterns in the data, causing underfitting. Variance describes how much your model's predictions would change if you used a different training set. High variance means the model is too sensitive to the training data, which results in overfitting and poor generalization to new data.

Definition

Bias is an error from erroneous assumptions in the learning algorithm. High bias can cause underfitting.

Definition: Variance

Variance is an error from sensitivity to small fluctuations in the training set. High variance can cause overfitting.

Prediction error in machine learning can be broken down into three key components: bias, variance, and irreducible error. Bias is the error that results from the simplifying assumptions made by a model; it measures how far, on average, your model's predictions are from the actual values. High bias occurs when the model is too simple to capture the true patterns in the data, leading to underfitting. Variance is the error caused by the model's sensitivity to the specific training data; it reflects how much predictions would change if you used a different training set. High variance means the model fits the training data too closely, causing overfitting and poor generalization.

Mathematically, the bias–variance trade-off can be described by the following decomposition for the expected squared error at a new input value $x$:

\mathbb{E}\left[(y - \hat{f}(x))^2\right] = \text{Bias}(\hat{f}(x))^2 + \text{Variance}(\hat{f}(x)) + \text{Irreducible Error}

$\text{Bias}(\hat{f}(x))$ : the difference between the average prediction of your model and the actual value you are trying to predict;
$\text{Variance}(\hat{f}(x))$ : the variability of the model prediction for a given data point $x$ due to different training sets;
$\text{Irreducible Error}$ : the noise inherent in the data, which cannot be reduced by any model.

The trade-off arises because decreasing bias (by making the model more flexible) usually increases variance, and vice versa. The goal is to find a model with the right balance, minimizing total prediction error. Ensemble methods are widely used to tackle this trade-off and achieve better performance.


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor

# Create a noisy sine wave dataset
np.random.seed(42)
X = np.sort(np.random.rand(80, 1) * 6 - 3, axis=0)  # X in [-3, 3]
y = np.sin(X).ravel() + np.random.normal(0, 0.25, X.shape[0])

# Fit a single decision tree
tree = DecisionTreeRegressor(max_depth=3, random_state=0)
tree.fit(X, y)

# Fit a bagging ensemble of decision trees
bagging = BaggingRegressor(
    estimator=DecisionTreeRegressor(max_depth=3),
    n_estimators=30,
    random_state=0
)
bagging.fit(X, y)

# Generate test data for predictions
X_test = np.linspace(-3, 3, 500).reshape(-1, 1)
y_true = np.sin(X_test).ravel()
y_tree = tree.predict(X_test)
y_bagging = bagging.predict(X_test)

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(X_test, y_true, label="True Function (sin)", color="green", linewidth=2)
plt.scatter(X, y, label="Training Data", color="gray", alpha=0.5)
plt.plot(X_test, y_tree, label="Single Decision Tree", color="red", linestyle="--")
plt.plot(X_test, y_bagging, label="Bagging Ensemble", color="blue")
plt.title("Variance Reduction with Bagging Ensemble")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 2

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Can you explain how bagging reduces variance in this example?

What is the difference between the single decision tree and the bagging ensemble in the plot?

How does the bias-variance trade-off relate to the results shown here?

Sveip for å vise menyen