Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Gradient Boosting for Time Series | Classical ML Models for Time Series
Machine Learning for Time Series Forecasting

bookGradient Boosting for Time Series

Gradient boosting is an ensemble learning technique that builds models sequentially, with each new model correcting the errors of the previous ones. Unlike a single decision tree, which can be prone to overfitting or underfitting, gradient boosting combines many weak learners—typically shallow trees—into a strong predictive model. This approach is especially powerful for time series forecasting, where capturing subtle temporal patterns and non-linear relationships is crucial. Gradient boosting methods, such as scikit-learn's HistGradientBoostingRegressor, are robust to outliers and can handle complex feature interactions, making them well-suited for forecasting tasks when compared to single trees.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np import pandas as pd from sklearn.ensemble import HistGradientBoostingRegressor from sklearn.model_selection import TimeSeriesSplit from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt # Create a synthetic time series (sinusoidal with noise) np.random.seed(42) n = 300 t = np.arange(n) y = np.sin(0.04 * t) + np.random.normal(scale=0.3, size=n) df = pd.DataFrame({'y': y}) # Create lagged features and rolling means df['lag1'] = df['y'].shift(1) df['lag2'] = df['y'].shift(2) df['roll3'] = df['y'].rolling(window=3).mean().shift(1) df['roll7'] = df['y'].rolling(window=7).mean().shift(1) df = df.dropna() X = df[['lag1', 'lag2', 'roll3', 'roll7']] y = df['y'] tscv = TimeSeriesSplit(n_splits=5) mse_scores = [] # For plotting last fold last_train_idx, last_test_idx = None, None last_predictions = None for train_idx, test_idx in tscv.split(X): X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] y_train, y_test = y.iloc[train_idx], y.iloc[test_idx] model = HistGradientBoostingRegressor(max_iter=100) model.fit(X_train, y_train) preds = model.predict(X_test) mse = mean_squared_error(y_test, preds) mse_scores.append(mse) last_train_idx, last_test_idx = train_idx, test_idx last_predictions = preds print("Mean MSE across folds:", np.mean(mse_scores))
copy
1234567891011121314151617181920
# Visualize plt.figure(figsize=(14, 6)) # Actual values (full series) plt.plot(y.index, y.values, label="Actual", color="black") # Predictions on final fold plt.plot(y.index[last_test_idx], last_predictions, label="Predictions (Last Fold)", color="orange", linewidth=2) # Train-test split line plt.axvline(y.index[last_test_idx][0], linestyle="--", color="gray", label="Train/Test Split") plt.title("HistGradientBoosting TS Forecast - Last Fold Visualization") plt.xlabel("Time") plt.ylabel("Value") plt.legend() plt.grid(True) plt.tight_layout() plt.show()
copy

When comparing random forests and gradient boosting for time series forecasting, several differences stand out:

  • Random forests average predictions from many independent trees;
  • This averaging reduces variance and improves robustness;
  • Some bias may remain if the underlying patterns are complex.

Gradient boosting takes a different approach:

  • Trees are built sequentially, with each tree correcting the errors of the previous one;
  • This process often leads to lower bias and better performance on challenging forecasting tasks;
  • Boosting models can be more sensitive to noise and overfitting if not tuned carefully.

Interpretability also differs:

  • Random forests are typically easier to analyze, as feature importances are more stable and the ensemble is less sensitive to small data changes;
  • Boosting models, while potentially more accurate, may require more careful examination to understand their predictions and feature dependencies.

1. What is a key advantage of gradient boosting over random forests for time series forecasting?

2. Which features are typically most important for boosting models in time series?

question mark

What is a key advantage of gradient boosting over random forests for time series forecasting?

Select the correct answer

question mark

Which features are typically most important for boosting models in time series?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 2

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain how the lagged features and rolling means help with time series forecasting?

What are some best practices for tuning HistGradientBoostingRegressor for time series data?

How does the performance of gradient boosting compare to random forests in this example?

bookGradient Boosting for Time Series

Stryg for at vise menuen

Gradient boosting is an ensemble learning technique that builds models sequentially, with each new model correcting the errors of the previous ones. Unlike a single decision tree, which can be prone to overfitting or underfitting, gradient boosting combines many weak learners—typically shallow trees—into a strong predictive model. This approach is especially powerful for time series forecasting, where capturing subtle temporal patterns and non-linear relationships is crucial. Gradient boosting methods, such as scikit-learn's HistGradientBoostingRegressor, are robust to outliers and can handle complex feature interactions, making them well-suited for forecasting tasks when compared to single trees.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np import pandas as pd from sklearn.ensemble import HistGradientBoostingRegressor from sklearn.model_selection import TimeSeriesSplit from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt # Create a synthetic time series (sinusoidal with noise) np.random.seed(42) n = 300 t = np.arange(n) y = np.sin(0.04 * t) + np.random.normal(scale=0.3, size=n) df = pd.DataFrame({'y': y}) # Create lagged features and rolling means df['lag1'] = df['y'].shift(1) df['lag2'] = df['y'].shift(2) df['roll3'] = df['y'].rolling(window=3).mean().shift(1) df['roll7'] = df['y'].rolling(window=7).mean().shift(1) df = df.dropna() X = df[['lag1', 'lag2', 'roll3', 'roll7']] y = df['y'] tscv = TimeSeriesSplit(n_splits=5) mse_scores = [] # For plotting last fold last_train_idx, last_test_idx = None, None last_predictions = None for train_idx, test_idx in tscv.split(X): X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] y_train, y_test = y.iloc[train_idx], y.iloc[test_idx] model = HistGradientBoostingRegressor(max_iter=100) model.fit(X_train, y_train) preds = model.predict(X_test) mse = mean_squared_error(y_test, preds) mse_scores.append(mse) last_train_idx, last_test_idx = train_idx, test_idx last_predictions = preds print("Mean MSE across folds:", np.mean(mse_scores))
copy
1234567891011121314151617181920
# Visualize plt.figure(figsize=(14, 6)) # Actual values (full series) plt.plot(y.index, y.values, label="Actual", color="black") # Predictions on final fold plt.plot(y.index[last_test_idx], last_predictions, label="Predictions (Last Fold)", color="orange", linewidth=2) # Train-test split line plt.axvline(y.index[last_test_idx][0], linestyle="--", color="gray", label="Train/Test Split") plt.title("HistGradientBoosting TS Forecast - Last Fold Visualization") plt.xlabel("Time") plt.ylabel("Value") plt.legend() plt.grid(True) plt.tight_layout() plt.show()
copy

When comparing random forests and gradient boosting for time series forecasting, several differences stand out:

  • Random forests average predictions from many independent trees;
  • This averaging reduces variance and improves robustness;
  • Some bias may remain if the underlying patterns are complex.

Gradient boosting takes a different approach:

  • Trees are built sequentially, with each tree correcting the errors of the previous one;
  • This process often leads to lower bias and better performance on challenging forecasting tasks;
  • Boosting models can be more sensitive to noise and overfitting if not tuned carefully.

Interpretability also differs:

  • Random forests are typically easier to analyze, as feature importances are more stable and the ensemble is less sensitive to small data changes;
  • Boosting models, while potentially more accurate, may require more careful examination to understand their predictions and feature dependencies.

1. What is a key advantage of gradient boosting over random forests for time series forecasting?

2. Which features are typically most important for boosting models in time series?

question mark

What is a key advantage of gradient boosting over random forests for time series forecasting?

Select the correct answer

question mark

Which features are typically most important for boosting models in time series?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 2
some-alt