Summary
The chapter covers the Independent and Identically Distributed (IID) assumption in model evaluation, detailing how ensuring data points are drawn independently from the same distribution underpins the validity of standard metrics like accuracy, precision, recall, and mean squared error.

General domain of usage
spam email classification

The **IID assumption** — that data points are **independent and identically distributed** — is a cornerstone of classical model evaluation in machine learning. When you train and evaluate a model, you typically expect that both your training and test datasets are drawn from the same underlying distribution, and that each data point does not influence the others. This assumption underpins the validity of standard evaluation metrics like `accuracy`, `precision`, `recall`, and `mean squared error`. If the IID assumption holds, these metrics provide a reliable estimate of how your model will perform on new, unseen data, because the test data is truly representative of the environment in which the model will operate.

Data are considered **IID** if each sample in the dataset is generated independently of the others (no sample depends on another), and all samples come from the same probability distribution.  

When the **IID assumption** is satisfied, you can use random splits (like train-test or cross-validation) to estimate model performance. This makes offline evaluation (without deploying the model) both feasible and trustworthy.

Definition

Consider a scenario where you are building a **spam email classifier**. You collect a large dataset of emails over a single month, randomly shuffle them, and split them into training and test sets. Since the emails are sampled randomly from the same time period and there is no reason to believe that one email's content affects another's, the **IID assumption** is likely satisfied. In this case, your offline evaluation — using metrics such as `accuracy` or `F1 score` on the test set — will reliably reflect how well your model will perform on future emails from the same source and time frame. This is because the statistical properties of the data remain consistent between training and testing, and each example is independent of the others.

Which of the following aspects of the spam email classification scenario depend on the IID assumption being true?

Master the art of evaluating machine learning models when the IID assumption fails. Learn to identify types of distribution shift, understand their impact on evaluation, and apply robust, real-world strategies for trustworthy model assessment.

Explore the foundational IID assumption in ML evaluation and why it often fails in practice, setting the stage for more robust assessment thinking.

Delve into covariate shift and concept shift, examining how each affects model evaluation and the reliability of offline metrics.

Move beyond theory to robust evaluation strategies, stress testing, and the tradeoffs between offline and online assessment.

The IID Assumption in Model Evaluation