Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Windowing and Target Construction | Foundations of ML-Based Time Series Forecasting
Quizzes & Challenges
Quizzes
Challenges
/
Machine Learning for Time Series Forecasting

bookWindowing and Target Construction

When you want to apply supervised machine learning to time series forecasting, you need to convert the original sequence into a structure that ML algorithms can understand. This is where windowing comes in. Windowing transforms a univariate time series into a supervised learning dataset by creating lagged features (past values) and targets (future values you want to predict). Each row in the resulting dataset represents a snapshot of the past, with the corresponding value to be predicted as the target.

Suppose you have a time series: [y1,y2,y3,...,yn][y₁, yβ‚‚, y₃, ..., yβ‚™]. To build a supervised dataset, you choose a window size (the number of past observations to use as features) and a forecast horizon (how far ahead you want to predict). For each time step, you collect the previous window size values as features and the value at forecast horizon steps ahead as the target. This process slides along the series, generating many overlapping windows and targets.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import pandas as pd import numpy as np import matplotlib.pyplot as plt # Create a synthetic time series np.random.seed(42) data = np.cumsum(np.random.randn(20)) # 20 values, random walk # Convert to DataFrame df = pd.DataFrame({'y': data}) # Define window size (number of lags) and forecast horizon window_size = 3 forecast_horizon = 1 # Create lagged features for lag in range(1, window_size + 1): df[f'lag_{lag}'] = df['y'].shift(lag) # Create target column (shifted by the forecast horizon) df['target'] = df['y'].shift(-forecast_horizon) # Drop rows with NaN values (due to shifting) supervised_df = df.dropna().reset_index(drop=True) # Plot the lagged features and target for the first few samples plt.figure(figsize=(10, 6)) sample_idx = range(3) # Show 3 samples for i in sample_idx: lags = supervised_df.loc[i, [f'lag_{j}' for j in range(window_size, 0, -1)]].values target = supervised_df.loc[i, 'target'] # x-coords for lags (-3, -2, -1) x = list(range(-window_size, 0)) # plot lag lines plt.plot(x, lags, marker='o', label=f'Sample {i+1} lags' if i == 0 else "") # target plt.scatter([forecast_horizon - 1], [target], color='red', marker='x', label='Target' if i == 0 else "") # dashed line plt.plot([-1, forecast_horizon - 1], [lags[-1], target], linestyle='dashed', color='gray', alpha=0.5) plt.xlabel('Time Offset (relative to prediction)') plt.ylabel('Value') plt.title('Windowing: Lagged Features and Target for First 3 Samples') plt.xticks(list(range(-window_size, forecast_horizon))) # update ticks if needed plt.legend() plt.grid(True) plt.show()
copy

Choosing the window size and forecast horizon has a direct impact on your dataset and the forecasting problem. The window size determines how much past information is used as features for each prediction. A larger window size means more historical context, but also increases the dimensionality of your feature space, which can lead to more complex models and potential overfitting if your dataset is small.

The forecast horizon controls how far into the future you want your model to predict. If you set the horizon to 1, you are predicting the next time step, if you set it to 5, you predict five steps ahead. Increasing the forecast horizon often makes the prediction task harder because the relationship between past and future values becomes weaker as the gap widens. The combination of window size and forecast horizon shapes both the number of samples you can generate and the relevance of your features to the target.

1. What does the window size parameter control in time series windowing?

2. What is the effect of increasing the forecast horizon when constructing targets?

question mark

What does the window size parameter control in time series windowing?

Select the correct answer

question mark

What is the effect of increasing the forecast horizon when constructing targets?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookWindowing and Target Construction

Swipe to show menu

When you want to apply supervised machine learning to time series forecasting, you need to convert the original sequence into a structure that ML algorithms can understand. This is where windowing comes in. Windowing transforms a univariate time series into a supervised learning dataset by creating lagged features (past values) and targets (future values you want to predict). Each row in the resulting dataset represents a snapshot of the past, with the corresponding value to be predicted as the target.

Suppose you have a time series: [y1,y2,y3,...,yn][y₁, yβ‚‚, y₃, ..., yβ‚™]. To build a supervised dataset, you choose a window size (the number of past observations to use as features) and a forecast horizon (how far ahead you want to predict). For each time step, you collect the previous window size values as features and the value at forecast horizon steps ahead as the target. This process slides along the series, generating many overlapping windows and targets.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
import pandas as pd import numpy as np import matplotlib.pyplot as plt # Create a synthetic time series np.random.seed(42) data = np.cumsum(np.random.randn(20)) # 20 values, random walk # Convert to DataFrame df = pd.DataFrame({'y': data}) # Define window size (number of lags) and forecast horizon window_size = 3 forecast_horizon = 1 # Create lagged features for lag in range(1, window_size + 1): df[f'lag_{lag}'] = df['y'].shift(lag) # Create target column (shifted by the forecast horizon) df['target'] = df['y'].shift(-forecast_horizon) # Drop rows with NaN values (due to shifting) supervised_df = df.dropna().reset_index(drop=True) # Plot the lagged features and target for the first few samples plt.figure(figsize=(10, 6)) sample_idx = range(3) # Show 3 samples for i in sample_idx: lags = supervised_df.loc[i, [f'lag_{j}' for j in range(window_size, 0, -1)]].values target = supervised_df.loc[i, 'target'] # x-coords for lags (-3, -2, -1) x = list(range(-window_size, 0)) # plot lag lines plt.plot(x, lags, marker='o', label=f'Sample {i+1} lags' if i == 0 else "") # target plt.scatter([forecast_horizon - 1], [target], color='red', marker='x', label='Target' if i == 0 else "") # dashed line plt.plot([-1, forecast_horizon - 1], [lags[-1], target], linestyle='dashed', color='gray', alpha=0.5) plt.xlabel('Time Offset (relative to prediction)') plt.ylabel('Value') plt.title('Windowing: Lagged Features and Target for First 3 Samples') plt.xticks(list(range(-window_size, forecast_horizon))) # update ticks if needed plt.legend() plt.grid(True) plt.show()
copy

Choosing the window size and forecast horizon has a direct impact on your dataset and the forecasting problem. The window size determines how much past information is used as features for each prediction. A larger window size means more historical context, but also increases the dimensionality of your feature space, which can lead to more complex models and potential overfitting if your dataset is small.

The forecast horizon controls how far into the future you want your model to predict. If you set the horizon to 1, you are predicting the next time step, if you set it to 5, you predict five steps ahead. Increasing the forecast horizon often makes the prediction task harder because the relationship between past and future values becomes weaker as the gap widens. The combination of window size and forecast horizon shapes both the number of samples you can generate and the relevance of your features to the target.

1. What does the window size parameter control in time series windowing?

2. What is the effect of increasing the forecast horizon when constructing targets?

question mark

What does the window size parameter control in time series windowing?

Select the correct answer

question mark

What is the effect of increasing the forecast horizon when constructing targets?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 2
some-alt