Aprende What Makes Time Series Forecasting Unique | Foundations of ML-Based Time Series Forecasting

Desliza para mostrar el menú

Time series forecasting stands apart from standard regression or classification tasks in machine learning due to its unique structure and goals. In typical supervised learning, you are given a dataset of independent samples, each with features and a corresponding label. The order of the data points does not matter, and shuffling the dataset is often a recommended practice to ensure model robustness.

However, in time series forecasting, the data is inherently ordered in time. Each observation is not independent; instead, it is usually correlated with previous observations—a property known as autocorrelation. Your goal is to predict future values based on past data, making the temporal order essential. The target variable is often a future value of the same series, not a separate label.

This temporal dependency means that the standard approach of randomly splitting or shuffling data for training and testing can break the very patterns you want your model to learn. Understanding these differences is crucial for building effective machine learning models for forecasting.


              1234567891011121314151617181920
            
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Generate a synthetic time series with autocorrelation
np.random.seed(42)
n_points = 100
time = np.arange(n_points)
series = np.zeros(n_points)
for t in range(1, n_points):
    series[t] = 0.8 * series[t-1] + np.random.normal(scale=0.5)

df = pd.DataFrame({'time': time, 'value': series})

plt.figure(figsize=(10, 4))
plt.plot(df['time'], df['value'], marker='o')
plt.title('Synthetic Time Series with Temporal Dependency')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

Definition

Autocorrelation measures how current values in a time series relate to past values. In time series data, observations are often not independent—values at one time point can be highly correlated with previous values. This is why shuffling data, which destroys the temporal structure, is problematic for time series forecasting: it removes the very dependencies your model needs to learn.

1. Why can't you randomly shuffle time series data when preparing it for machine learning forecasting tasks?

2. Which property distinguishes time series forecasting from standard regression?

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 1

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Sección 1. Capítulo 1