Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära The Relationship Between Likelihood and Loss Functions | Distributions and Loss Functions
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Probability Distributions for Machine Learning

bookThe Relationship Between Likelihood and Loss Functions

Understanding how likelihood relates to loss functions is a central concept in machine learning. When you train a model, you often want to find parameters that make the observed data most probable under your model — this is called maximizing the likelihood. However, most optimization algorithms are designed to minimize a function, not maximize it. To bridge this, you minimize the negative log-likelihood instead. The negative log-likelihood is what becomes your loss function. This means that minimizing loss is mathematically equivalent to maximizing likelihood, and the specific form of the loss depends on the probability distribution you assume for your data.

123456789101112131415161718192021222324252627
import numpy as np # Gaussian (Normal) negative log-likelihood for regression def gaussian_nll(y_true, y_pred, sigma): n = len(y_true) # Negative log-likelihood for Gaussian (ignoring constant terms) nll = 0.5 * np.sum(((y_true - y_pred) / sigma) ** 2) + n * np.log(sigma) return nll # Bernoulli negative log-likelihood (cross-entropy) for binary classification def bernoulli_nll(y_true, y_pred_prob): # Clip probabilities to avoid log(0) eps = 1e-15 y_pred_prob = np.clip(y_pred_prob, eps, 1 - eps) nll = -np.sum(y_true * np.log(y_pred_prob) + (1 - y_true) * np.log(1 - y_pred_prob)) return nll # Example data y_true_reg = np.array([2.0, 3.0, 4.5]) y_pred_reg = np.array([2.2, 2.8, 4.4]) sigma = 1.0 y_true_clf = np.array([1, 0, 1, 1]) y_pred_prob_clf = np.array([0.9, 0.2, 0.8, 0.7]) print("Gaussian negative log-likelihood:", gaussian_nll(y_true_reg, y_pred_reg, sigma)) print("Bernoulli negative log-likelihood:", bernoulli_nll(y_true_clf, y_pred_prob_clf))
copy

This principle — that minimizing loss is the same as maximizing likelihood — shapes the loss functions used in machine learning. In regression tasks, you typically assume your data follow a Gaussian distribution, which leads to the mean squared error loss. In classification, assuming a Bernoulli or multinomial distribution leads to cross-entropy loss. These losses are not arbitrary; they directly emerge from the statistical properties of the underlying probability distributions. By grounding your loss functions in likelihood, you ensure your models are statistically consistent with the data-generating process, which is key for both interpretability and performance.

question mark

Which statements about the relationship between likelihood and loss functions are correct?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 4. Kapitel 1

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

bookThe Relationship Between Likelihood and Loss Functions

Svep för att visa menyn

Understanding how likelihood relates to loss functions is a central concept in machine learning. When you train a model, you often want to find parameters that make the observed data most probable under your model — this is called maximizing the likelihood. However, most optimization algorithms are designed to minimize a function, not maximize it. To bridge this, you minimize the negative log-likelihood instead. The negative log-likelihood is what becomes your loss function. This means that minimizing loss is mathematically equivalent to maximizing likelihood, and the specific form of the loss depends on the probability distribution you assume for your data.

123456789101112131415161718192021222324252627
import numpy as np # Gaussian (Normal) negative log-likelihood for regression def gaussian_nll(y_true, y_pred, sigma): n = len(y_true) # Negative log-likelihood for Gaussian (ignoring constant terms) nll = 0.5 * np.sum(((y_true - y_pred) / sigma) ** 2) + n * np.log(sigma) return nll # Bernoulli negative log-likelihood (cross-entropy) for binary classification def bernoulli_nll(y_true, y_pred_prob): # Clip probabilities to avoid log(0) eps = 1e-15 y_pred_prob = np.clip(y_pred_prob, eps, 1 - eps) nll = -np.sum(y_true * np.log(y_pred_prob) + (1 - y_true) * np.log(1 - y_pred_prob)) return nll # Example data y_true_reg = np.array([2.0, 3.0, 4.5]) y_pred_reg = np.array([2.2, 2.8, 4.4]) sigma = 1.0 y_true_clf = np.array([1, 0, 1, 1]) y_pred_prob_clf = np.array([0.9, 0.2, 0.8, 0.7]) print("Gaussian negative log-likelihood:", gaussian_nll(y_true_reg, y_pred_reg, sigma)) print("Bernoulli negative log-likelihood:", bernoulli_nll(y_true_clf, y_pred_prob_clf))
copy

This principle — that minimizing loss is the same as maximizing likelihood — shapes the loss functions used in machine learning. In regression tasks, you typically assume your data follow a Gaussian distribution, which leads to the mean squared error loss. In classification, assuming a Bernoulli or multinomial distribution leads to cross-entropy loss. These losses are not arbitrary; they directly emerge from the statistical properties of the underlying probability distributions. By grounding your loss functions in likelihood, you ensure your models are statistically consistent with the data-generating process, which is key for both interpretability and performance.

question mark

Which statements about the relationship between likelihood and loss functions are correct?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 4. Kapitel 1
some-alt