Leer Overfitting: A Theoretical Perspective | Generalization and Overfitting

Veeg om het menu te tonen

Overfitting is a central concern in statistical learning theory, arising when a model fits the training data too closely and fails to generalize to unseen data. To formalize this, consider the concepts of empirical risk and true risk. Empirical risk is the average loss a model achieves on the training set, while true risk is the expected loss over the entire data-generating distribution. Overfitting occurs when a model achieves a low empirical risk but has a high true risk, meaning it performs well on the training data but poorly on new, unseen data. This gap between empirical and true risk is the essence of generalization error.

Generalization bounds, as discussed earlier, provide theoretical guarantees that relate the empirical risk to the true risk, often involving the capacity of the hypothesis class (such as its VC dimension) and the amount of available data. When a model has high capacity relative to the size of the training set, the generalization bound becomes loose, and the difference between empirical and true risk can be large. This is the theoretical underpinning of overfitting: the model has enough flexibility to fit noise or idiosyncrasies in the training data that do not represent the underlying distribution.

High-capacity model with small dataset

When you use a complex hypothesis class (such as deep neural networks or high-degree polynomials) on a limited training set, the model can fit the data points exactly, including noise, leading to overfitting.

Increasing model complexity while holding data fixed

If you start with a simple model and gradually increase its capacity without increasing the dataset size, the empirical risk may decrease, but the generalization gap (difference between true and empirical risk) tends to grow, resulting in overfitting.

Insufficient data for the chosen hypothesis class

Even with a moderately complex model, if the training data is too limited, the model may capture random fluctuations instead of the true underlying pattern, again causing overfitting.

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 4. Hoofdstuk 2

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 4. Hoofdstuk 2