Learn Windowing and Target Construction | Foundations of ML-Based Time Series Forecasting

Swipe to show menu

When you want to apply supervised machine learning to time series forecasting, you need to convert the original sequence into a structure that ML algorithms can understand. This is where windowing comes in. Windowing transforms a univariate time series into a supervised learning dataset by creating lagged features (past values) and targets (future values you want to predict). Each row in the resulting dataset represents a snapshot of the past, with the corresponding value to be predicted as the target.

Suppose you have a time series: $[y₁, y₂, y₃, ..., yₙ]$ . To build a supervised dataset, you choose a window size (the number of past observations to use as features) and a forecast horizon (how far ahead you want to predict). For each time step, you collect the previous window size values as features and the value at forecast horizon steps ahead as the target. This process slides along the series, generating many overlapping windows and targets.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a synthetic time series
np.random.seed(42)
data = np.cumsum(np.random.randn(20))  # 20 values, random walk

# Convert to DataFrame
df = pd.DataFrame({'y': data})

# Define window size (number of lags) and forecast horizon
window_size = 3
forecast_horizon = 1

# Create lagged features
for lag in range(1, window_size + 1):
    df[f'lag_{lag}'] = df['y'].shift(lag)

# Create target column (shifted by the forecast horizon)
df['target'] = df['y'].shift(-forecast_horizon)

# Drop rows with NaN values (due to shifting)
supervised_df = df.dropna().reset_index(drop=True)

# Plot the lagged features and target for the first few samples
plt.figure(figsize=(10, 6))
sample_idx = range(3)  # Show 3 samples

for i in sample_idx:
    lags = supervised_df.loc[i, [f'lag_{j}' for j in range(window_size, 0, -1)]].values
    target = supervised_df.loc[i, 'target']

    # x-coords for lags (-3, -2, -1)
    x = list(range(-window_size, 0))

    # plot lag lines
    plt.plot(x, lags, marker='o', label=f'Sample {i+1} lags' if i == 0 else "")

    # target
    plt.scatter([forecast_horizon - 1], [target],
                color='red', marker='x',
                label='Target' if i == 0 else "")

    # dashed line
    plt.plot([-1, forecast_horizon - 1],
             [lags[-1], target],
             linestyle='dashed', color='gray', alpha=0.5)

plt.xlabel('Time Offset (relative to prediction)')
plt.ylabel('Value')
plt.title('Windowing: Lagged Features and Target for First 3 Samples')
plt.xticks(list(range(-window_size, forecast_horizon)))  # update ticks if needed
plt.legend()
plt.grid(True)
plt.show()

Choosing the window size and forecast horizon has a direct impact on your dataset and the forecasting problem. The window size determines how much past information is used as features for each prediction. A larger window size means more historical context, but also increases the dimensionality of your feature space, which can lead to more complex models and potential overfitting if your dataset is small.

The forecast horizon controls how far into the future you want your model to predict. If you set the horizon to 1, you are predicting the next time step, if you set it to 5, you predict five steps ahead. Increasing the forecast horizon often makes the prediction task harder because the relationship between past and future values becomes weaker as the gap widens. The combination of window size and forecast horizon shapes both the number of samples you can generate and the relevance of your features to the target.

1. What does the window size parameter control in time series windowing?

2. What is the effect of increasing the forecast horizon when constructing targets?

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 2