Leer Covariate Shift: When Inputs Change | Types of Distribution Shift and Their Impact

Veeg om het menu te tonen

Covariate shift occurs when the distribution of the input variables, often denoted as $X$ , changes between the training and test (or deployment) environments, but the conditional relationship between the inputs and outputs, $P(Y|X)$ , remains the same. This means that while the model's learned mapping from inputs to outputs is still valid, the types of inputs it encounters may be different from those seen during training.

To make this concrete, imagine you are building a spam classifier using emails collected from a particular company over the past year. Suppose the company merges with another, and now the incoming emails reflect different writing styles, topics, or sender domains. The underlying definition of "spam" versus "not spam" might not change, but the input features — such as word frequencies or sender addresses — do. Here, the input distribution has shifted, but the way inputs map to outputs (spam or not spam) is unchanged.

Note

Key characteristics of covariate shift:

The distribution of input variables ( $P(X)$ ) changes between training and evaluation environments;
The conditional distribution of outputs given inputs ( $P(Y|X)$ ) remains unchanged;
The model may encounter new or differently distributed inputs, but the ground-truth labeling rule is stable.

Standard offline evaluation often assumes that the test data is drawn from the same input distribution as the training data. Under covariate shift, this assumption is violated. Returning to the spam classifier example, if you evaluate the model on a random sample of emails from the original company, you may find that it performs well. However, once deployed in the merged company with a new mix of emails, the inputs look different. The model may now see patterns it was not exposed to during training. As a result, its performance in the real world may be substantially worse than estimated offline, leading to an overestimation of its true effectiveness under the new conditions.

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 2. Hoofdstuk 1

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 2. Hoofdstuk 1