Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen What Is Drift | Understanding Drift
Quizzes & Challenges
Quizzes
Challenges
/
Handling Data Drift in Production

bookWhat Is Drift

In machine learning, drift refers to a change in the underlying data or relationships that a model relies on to make predictions. There are three main types of drift you should understand: data drift, feature drift, and concept drift.

Note
Definition

Data drift is a broad term that describes any change in the statistical properties of the input data over time. This might mean the overall distribution of the dataset has shifted, which can affect model performance even if the relationships between features and targets remain the same.

Note
Definition

Feature drift is a more specific case where the distribution of one or more individual features changes. For example, the average age of customers in your dataset might increase over time, or the range of values for a sensor reading might shift.

Note
Definition

Concept drift occurs when the relationship between input features and the target variable changes. This means that even if the input data appears similar, the way it maps to the output has changed. For instance, if a model predicts whether an email is spam, but spammers start using new tactics, the features that once indicated spam may no longer be reliable.

Understanding the differences between these types of drift is crucial for maintaining reliable machine learning pipelines. If you do not monitor for drift, your models can become less accurate, leading to poor decisions and outcomes.

Note
Note

Common causes of drift include:

  • Temporal changes: data naturally evolves over time;
  • Sampling bias: data collection methods or sources change, introducing new patterns;
  • Behavioral shifts: users, customers, or systems change their behavior, leading to new data trends.
12345678910111213141516
import numpy as np import matplotlib.pyplot as plt # Generate synthetic feature data for two time periods np.random.seed(42) feature_period1 = np.random.normal(loc=50, scale=5, size=1000) feature_period2 = np.random.normal(loc=55, scale=7, size=1000) plt.figure(figsize=(8, 5)) plt.hist(feature_period1, bins=30, alpha=0.6, label="Period 1", color="blue", density=True) plt.hist(feature_period2, bins=30, alpha=0.6, label="Period 2", color="orange", density=True) plt.title("Feature Distribution Over Time") plt.xlabel("Feature Value") plt.ylabel("Density") plt.legend() plt.show()
copy

You can often spot feature drift by visually comparing feature distributions from different time periods, as in the plot above. If the shapes, centers, or spreads of the distributions change noticeably, this is a strong indicator of drift. For example, if the histogram for "Period 2" is shifted to the right and has a wider spread than "Period 1", it means the feature's average value and variability have both changed. Such changes can impact your model's predictions and may require retraining or adjustment.

question mark

Which scenario best describes concept drift?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 1

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 11.11

bookWhat Is Drift

Swipe um das Menü anzuzeigen

In machine learning, drift refers to a change in the underlying data or relationships that a model relies on to make predictions. There are three main types of drift you should understand: data drift, feature drift, and concept drift.

Note
Definition

Data drift is a broad term that describes any change in the statistical properties of the input data over time. This might mean the overall distribution of the dataset has shifted, which can affect model performance even if the relationships between features and targets remain the same.

Note
Definition

Feature drift is a more specific case where the distribution of one or more individual features changes. For example, the average age of customers in your dataset might increase over time, or the range of values for a sensor reading might shift.

Note
Definition

Concept drift occurs when the relationship between input features and the target variable changes. This means that even if the input data appears similar, the way it maps to the output has changed. For instance, if a model predicts whether an email is spam, but spammers start using new tactics, the features that once indicated spam may no longer be reliable.

Understanding the differences between these types of drift is crucial for maintaining reliable machine learning pipelines. If you do not monitor for drift, your models can become less accurate, leading to poor decisions and outcomes.

Note
Note

Common causes of drift include:

  • Temporal changes: data naturally evolves over time;
  • Sampling bias: data collection methods or sources change, introducing new patterns;
  • Behavioral shifts: users, customers, or systems change their behavior, leading to new data trends.
12345678910111213141516
import numpy as np import matplotlib.pyplot as plt # Generate synthetic feature data for two time periods np.random.seed(42) feature_period1 = np.random.normal(loc=50, scale=5, size=1000) feature_period2 = np.random.normal(loc=55, scale=7, size=1000) plt.figure(figsize=(8, 5)) plt.hist(feature_period1, bins=30, alpha=0.6, label="Period 1", color="blue", density=True) plt.hist(feature_period2, bins=30, alpha=0.6, label="Period 2", color="orange", density=True) plt.title("Feature Distribution Over Time") plt.xlabel("Feature Value") plt.ylabel("Density") plt.legend() plt.show()
copy

You can often spot feature drift by visually comparing feature distributions from different time periods, as in the plot above. If the shapes, centers, or spreads of the distributions change noticeably, this is a strong indicator of drift. For example, if the histogram for "Period 2" is shifted to the right and has a wider spread than "Period 1", it means the feature's average value and variability have both changed. Such changes can impact your model's predictions and may require retraining or adjustment.

question mark

Which scenario best describes concept drift?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 1
some-alt