Learn Direct Forecasting | Multi-Step Forecasting Strategies

Swipe to show menu

When you want to predict multiple future values in a time series, the direct strategy offers a clear and structured approach. With this method, you train a separate supervised model for each forecast step you care about. For example, if you want to predict the next two time steps, you would build one model to forecast the value at time t+1 and a completely separate model to predict the value at time t+2. Each model is trained independently using the available lagged features, so the t+1 model learns to predict one step ahead, while the t+2 model learns to predict two steps ahead, directly from the current and past data.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor

# Generate a simple time series
np.random.seed(42)
n = 100
data = pd.DataFrame({
    "y": np.cumsum(np.random.randn(n)) + 10
})

# Create lagged features
data["lag1"] = data["y"].shift(1)
data["lag2"] = data["y"].shift(2)

# Create targets for t+1 and t+2 forecasts
data["target_t1"] = data["y"].shift(-1)
data["target_t2"] = data["y"].shift(-2)

# Drop rows with NaN values due to shifting
train = data.dropna()

# Features for both models
features = ["lag1", "lag2"]

# Model for 1-step ahead (t+1)
X_t1 = train[features]
y_t1 = train["target_t1"]
model_t1 = RandomForestRegressor(random_state=0)
model_t1.fit(X_t1, y_t1)

# Model for 2-step ahead (t+2)
X_t2 = train[features]
y_t2 = train["target_t2"]
model_t2 = RandomForestRegressor(random_state=0)
model_t2.fit(X_t2, y_t2)

# Example: Predict next value(s) using the last available data point
last_row = data.iloc[[-1]][features]
pred_t1 = model_t1.predict(last_row)
pred_t2 = model_t2.predict(last_row)

print(f"1-step ahead prediction: {pred_t1[0]:.2f}")
print(f"2-step ahead prediction: {pred_t2[0]:.2f}")

A key difference between the direct and recursive strategies lies in how predictions are generated and how errors can propagate. The direct approach, as shown above, trains a distinct model for each forecast horizon, so the prediction at t+2 does not depend on the t+1 prediction. This can offer more flexibility because you can tailor each model to its specific step ahead. However, it can also increase complexity, since you need to manage and train multiple models—one for each forecasted step. In contrast, the recursive strategy uses a single model to predict the next step, then feeds its prediction back as input to forecast further into the future. While this is simpler to implement, it can lead to error accumulation, since mistakes made in early steps are carried forward to later predictions. The direct strategy avoids this particular error propagation, but may require more data and computational resources due to the increased number of models.

1. What is a key advantage of the direct approach for multi-step forecasting?

2. What is a potential downside of training separate models for each horizon?

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 2