Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Feature Extraction for Time Series | Foundations of ML-Based Time Series Forecasting
Machine Learning for Time Series Forecasting

bookFeature Extraction for Time Series

Understanding how to transform raw time series data into meaningful features is essential for building effective machine learning models. Three of the most important types of features you can create for time series forecasting are lagged values, rolling statistics (such as rolling means and standard deviations), and calendar features like day of the week or month. Each of these feature types helps your model capture different patterns and dependencies in the data.

1234567891011121314151617181920
import pandas as pd import numpy as np # Create a synthetic daily time series np.random.seed(42) dates = pd.date_range("2024-01-01", periods=10, freq="D") values = np.random.randint(10, 100, size=10) df = pd.DataFrame({"date": dates, "value": values}) df.set_index("date", inplace=True) # Add a lagged feature (previous day's value) df["lag_1"] = df["value"].shift(1) # Add a rolling mean (window of 3 days, excluding current day) df["rolling_mean_3"] = df["value"].shift(1).rolling(window=3).mean() # Add a calendar feature: day of week (Monday=0, Sunday=6) df["day_of_week"] = df.index.dayofweek print(df)
copy
Lagged values
expand arrow
  • Use lagged features when your target variable depends on its own recent history;
  • Lagged values are essential for capturing autocorrelation and short-term dependencies;
  • Pitfall: using future lags (e.g., negative shifts) can cause data leakage and unrealistic forecasts.
Rolling means and standard deviations
expand arrow
  • Use rolling statistics to smooth out short-term fluctuations and reveal local trends or volatility;
  • Rolling features are useful when recent averages or variability influence the target;
  • Pitfall: including the current or future value in the rolling window can leak information from the target period.
Calendar features
expand arrow
  • Use calendar features to capture recurring patterns tied to time, such as weekly seasonality or holidays;
  • Day of week or month features help the model recognize cycles related to the calendar;
  • Pitfall: for highly irregular time series or non-calendar-based data, these features may add noise instead of value.

1. Which feature type helps capture seasonality in daily sales data?

2. What is a potential risk when using rolling statistics as features?

question mark

Which feature type helps capture seasonality in daily sales data?

Select the correct answer

question mark

What is a potential risk when using rolling statistics as features?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 3

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain how each feature helps improve forecasting accuracy?

What other types of features can I create for time series data?

Can you show how to handle missing values created by lag or rolling features?

bookFeature Extraction for Time Series

Stryg for at vise menuen

Understanding how to transform raw time series data into meaningful features is essential for building effective machine learning models. Three of the most important types of features you can create for time series forecasting are lagged values, rolling statistics (such as rolling means and standard deviations), and calendar features like day of the week or month. Each of these feature types helps your model capture different patterns and dependencies in the data.

1234567891011121314151617181920
import pandas as pd import numpy as np # Create a synthetic daily time series np.random.seed(42) dates = pd.date_range("2024-01-01", periods=10, freq="D") values = np.random.randint(10, 100, size=10) df = pd.DataFrame({"date": dates, "value": values}) df.set_index("date", inplace=True) # Add a lagged feature (previous day's value) df["lag_1"] = df["value"].shift(1) # Add a rolling mean (window of 3 days, excluding current day) df["rolling_mean_3"] = df["value"].shift(1).rolling(window=3).mean() # Add a calendar feature: day of week (Monday=0, Sunday=6) df["day_of_week"] = df.index.dayofweek print(df)
copy
Lagged values
expand arrow
  • Use lagged features when your target variable depends on its own recent history;
  • Lagged values are essential for capturing autocorrelation and short-term dependencies;
  • Pitfall: using future lags (e.g., negative shifts) can cause data leakage and unrealistic forecasts.
Rolling means and standard deviations
expand arrow
  • Use rolling statistics to smooth out short-term fluctuations and reveal local trends or volatility;
  • Rolling features are useful when recent averages or variability influence the target;
  • Pitfall: including the current or future value in the rolling window can leak information from the target period.
Calendar features
expand arrow
  • Use calendar features to capture recurring patterns tied to time, such as weekly seasonality or holidays;
  • Day of week or month features help the model recognize cycles related to the calendar;
  • Pitfall: for highly irregular time series or non-calendar-based data, these features may add noise instead of value.

1. Which feature type helps capture seasonality in daily sales data?

2. What is a potential risk when using rolling statistics as features?

question mark

Which feature type helps capture seasonality in daily sales data?

Select the correct answer

question mark

What is a potential risk when using rolling statistics as features?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 1. Kapitel 3
some-alt