Summary  
This chapter explains the VC dimension concept, which measures the capacity of a hypothesis class by identifying the largest set of points it can shatter, and discusses how it relates model complexity to generalization ability.

General domain of usage  
Statistical learning theory / machine learning model evaluation

The **VC dimension** (Vapnik–Chervonenkis dimension) is a fundamental concept in statistical learning theory that quantifies the capacity of a hypothesis class. Specifically, the VC dimension is the largest number of points that can be **shattered** by a hypothesis class, where **shattering** means that for every possible labeling of these points, there exists a hypothesis in the class that correctly classifies them. In other words, the VC dimension captures the expressive power of a set of functions by describing how complex a set of patterns it can realize.

Understanding the VC dimension helps you reason about the tradeoff between model complexity and generalization. A higher VC dimension means a hypothesis class can fit more complex patterns, but it can also increase the risk of overfitting. Conversely, a low VC dimension may limit the model’s ability to fit data but can improve generalization.

Suppose your hypothesis class consists of all **threshold functions** on the real line: each function classifies a real number as positive if it is greater than some threshold and negative otherwise.

- Consider two points, $$x1$$ and $$x2$$;
- No matter how you label these two points, you can always find a threshold that separates them accordingly;
- For three points, there exists at least one labeling (the middle point labeled differently from the other two) that cannot be realized by any threshold.

Thus, the **VC dimension** of threshold functions on the real line is **2**.

VC Dimension of Threshold Functions on the Real Line

Now consider the class of sets defined by **intervals** on the real line: each hypothesis is an interval, and a point is classified as positive if it lies within the interval.

- For any three points, you can label them in all possible ways using an interval;
- With four points, there is at least one labeling (alternating positive and negative labels) that cannot be realized by a single interval.

Therefore, the **VC dimension** of intervals on the real line is **3**.

VC Dimension of Intervals on the Real Line

Consider the class of **linear classifiers** (lines) in the plane.

- For three non-collinear points, any labeling can be realized by a line;
- For four points in general position (not all on the same line or forming a convex quadrilateral), there is at least one labeling that cannot be realized by a single line.

Thus, the **VC dimension** of linear classifiers in two dimensions is **3**.

VC Dimension of Linear Classifiers in 2D

Which statement best describes the VC dimension of a hypothesis class?

Explore the mathematical foundations of machine learning generalization. This course covers empirical risk minimization, bias–variance tradeoff, VC dimension, generalization bounds, and the theory of overfitting, equipping you with rigorous intuition for model selection and evaluation.

Establish the formal framework for learning from data, introducing key definitions and the supervised learning setup.

Delve into the statistical reasoning behind the bias–variance tradeoff and its implications for model selection.

Introduce the concept of hypothesis class capacity and the VC dimension as a measure of model complexity.

Examine the theoretical foundations of generalization and the phenomenon of overfitting.

VC Dimension: Definition and Examples