Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Risk Minimization: Expected vs Empirical Risk | Foundations of Loss Functions
Quizzes & Challenges
Quizzes
Challenges
/
Loss Functions in Machine Learning

bookRisk Minimization: Expected vs Empirical Risk

To understand how models learn, it is essential to distinguish between expected risk and empirical risk. These two ideas form the backbone of statistical learning theory and explain why models sometimes generalize well — and sometimes fail dramatically.

Expected Risk: The Ideal but Unreachable Goal

In an ideal world, you would evaluate your model over the entire true data distribution. This leads to the expected risk:

R(f)=E(x,y)P[L(y,f(x))]R(f) = \mathbb{E}_{(x,y) \sim P}[L(y, f(x))]

Here:

  • PP is the true (and usually unknown) distribution of data;
  • ff is the model;
  • LL is the loss function.

Expected risk answers the question:

"How well would my model perform on all possible data it could ever encounter?"

Unfortunately, you never get to see the entire distribution. If only real life were that generous.

Empirical Risk: What You Can Actually Compute

Since you only have a finite dataset, you approximate the expected risk with the empirical risk:

R^(f)=1ni=1nL(yi,f(xi))\hat{R}(f) = \frac{1}{n} \sum_{i=1}^n L(y_i, f(x_i))

Empirical risk simply averages the loss over the training samples. It's the practical version of the theoretical ideal.

Note
Note

Expected risk is like measuring your driving skills on all roads in the world. Empirical risk is like testing yourself on the few routes you drive every day. Mastering just those doesn't guarantee you're ready for a mountain pass in Nepal.

123456789101112131415161718192021
import numpy as np # True synthetic distribution rng = np.random.default_rng(42) P_x = rng.normal(loc=0, scale=1, size=100000) # large sample ≈ true distribution true_y = 3 * P_x + 1 + rng.normal(0, 0.5, size=100000) # Small training sample idx = rng.choice(len(P_x), size=30, replace=False) train_x = P_x[idx] train_y = true_y[idx] # Example model: f(x) = 2.8x + 0.8 (slightly off) pred_true = 2.8 * P_x + 0.8 pred_train = 2.8 * train_x + 0.8 expected_risk = np.mean((true_y - pred_true)**2) empirical_risk = np.mean((train_y - pred_train)**2) print("Approx Expected Risk:", expected_risk) print("Empirical Risk:", empirical_risk)
copy

What this demonstrates

  • Empirical risk is almost always lower than the true expected risk.
  • The model matches the training points better than the overall distribution.
  • This gap is one of the root causes of overfitting.

Minimizing empirical risk is central to training machine learning models. The hope is that lowering the loss on the training data will also reduce the expected risk on unseen data. However, relying only on empirical risk can cause overfitting: the model may learn patterns that are specific to the training set, including noise. In such cases, it performs well on the training data but poorly on new examples. This creates the key challenge in machine learning — achieving a balance between fitting the data and ensuring good generalization.

question mark

Which of the following statements best describes the difference between expected risk and empirical risk, and explains why empirical risk is used in practice?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 2

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Awesome!

Completion rate improved to 6.67

bookRisk Minimization: Expected vs Empirical Risk

Deslize para mostrar o menu

To understand how models learn, it is essential to distinguish between expected risk and empirical risk. These two ideas form the backbone of statistical learning theory and explain why models sometimes generalize well — and sometimes fail dramatically.

Expected Risk: The Ideal but Unreachable Goal

In an ideal world, you would evaluate your model over the entire true data distribution. This leads to the expected risk:

R(f)=E(x,y)P[L(y,f(x))]R(f) = \mathbb{E}_{(x,y) \sim P}[L(y, f(x))]

Here:

  • PP is the true (and usually unknown) distribution of data;
  • ff is the model;
  • LL is the loss function.

Expected risk answers the question:

"How well would my model perform on all possible data it could ever encounter?"

Unfortunately, you never get to see the entire distribution. If only real life were that generous.

Empirical Risk: What You Can Actually Compute

Since you only have a finite dataset, you approximate the expected risk with the empirical risk:

R^(f)=1ni=1nL(yi,f(xi))\hat{R}(f) = \frac{1}{n} \sum_{i=1}^n L(y_i, f(x_i))

Empirical risk simply averages the loss over the training samples. It's the practical version of the theoretical ideal.

Note
Note

Expected risk is like measuring your driving skills on all roads in the world. Empirical risk is like testing yourself on the few routes you drive every day. Mastering just those doesn't guarantee you're ready for a mountain pass in Nepal.

123456789101112131415161718192021
import numpy as np # True synthetic distribution rng = np.random.default_rng(42) P_x = rng.normal(loc=0, scale=1, size=100000) # large sample ≈ true distribution true_y = 3 * P_x + 1 + rng.normal(0, 0.5, size=100000) # Small training sample idx = rng.choice(len(P_x), size=30, replace=False) train_x = P_x[idx] train_y = true_y[idx] # Example model: f(x) = 2.8x + 0.8 (slightly off) pred_true = 2.8 * P_x + 0.8 pred_train = 2.8 * train_x + 0.8 expected_risk = np.mean((true_y - pred_true)**2) empirical_risk = np.mean((train_y - pred_train)**2) print("Approx Expected Risk:", expected_risk) print("Empirical Risk:", empirical_risk)
copy

What this demonstrates

  • Empirical risk is almost always lower than the true expected risk.
  • The model matches the training points better than the overall distribution.
  • This gap is one of the root causes of overfitting.

Minimizing empirical risk is central to training machine learning models. The hope is that lowering the loss on the training data will also reduce the expected risk on unseen data. However, relying only on empirical risk can cause overfitting: the model may learn patterns that are specific to the training set, including noise. In such cases, it performs well on the training data but poorly on new examples. This creates the key challenge in machine learning — achieving a balance between fitting the data and ensuring good generalization.

question mark

Which of the following statements best describes the difference between expected risk and empirical risk, and explains why empirical risk is used in practice?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 2
some-alt