Learn Gaussian Likelihood Functions and Log-Likelihood in Regression

Swipe to show menu

The Gaussian (or normal) distribution is central to many machine learning models, especially in regression. In regression, you often assume that the observed data points are generated from a process where the true value is given by a function (such as a linear model), but each observation is corrupted by Gaussian noise. This leads to the Gaussian likelihood, which describes the probability of observing your data given the model's predictions and a noise parameter (variance).

Mathematically, for each data point $y_i$ with prediction $μ_i$ and noise standard deviation $σ$ , the likelihood is:

P(y_i | μ_i, σ) = (1 / (\sqrt{2π} σ)) * \exp(- (y_i - μ_i)^2 / (2σ^2))

For a dataset, the likelihood is the product of these probabilities across all points. In practice, it's easier to work with the log-likelihood, which turns the product into a sum and simplifies optimization. The log-likelihood for a set of data points is:

\log L = -N/2 * \log(2π) - N * \log(σ) - (1/(2σ^2)) * \Sigma (y_i - μ_i)^2

where $N$ is the number of data points.


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546
            
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(0)
N = 50
true_mu = 2.0
true_sigma = 1.0
y = np.random.normal(loc=true_mu, scale=true_sigma, size=N)

# Define range of mu and sigma to evaluate
mu_values = np.linspace(0, 4, 100)
sigma_values = np.linspace(0.5, 2.0, 100)

# Compute log-likelihood for different mu (fix sigma)
log_likelihood_mu = [
    -N/2 * np.log(2 * np.pi)
    - N * np.log(true_sigma)
    - np.sum((y - mu)**2) / (2 * true_sigma**2)
    for mu in mu_values
]

# Compute log-likelihood for different sigma (fix mu)
log_likelihood_sigma = [
    -N/2 * np.log(2 * np.pi)
    - N * np.log(sigma)
    - np.sum((y - true_mu)**2) / (2 * sigma**2)
    for sigma in sigma_values
]

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(mu_values, log_likelihood_mu)
plt.xlabel("mu")
plt.ylabel("Log-Likelihood")
plt.title("Log-Likelihood vs mu (sigma fixed)")

plt.subplot(1, 2, 2)
plt.plot(sigma_values, log_likelihood_sigma)
plt.xlabel("sigma")
plt.ylabel("Log-Likelihood")
plt.title("Log-Likelihood vs sigma (mu fixed)")

plt.tight_layout()
plt.show()

The connection between the Gaussian log-likelihood and regression loss functions is fundamental in machine learning. When you maximize the log-likelihood with respect to the model parameters (such as the weights in linear regression), you are effectively minimizing the sum of squared residuals—the same objective as minimizing mean squared error (MSE). This is because the negative log-likelihood (up to a constant and scaling factor) is equivalent to the MSE loss function. Therefore, fitting a regression model by least squares is mathematically the same as maximizing the likelihood under a Gaussian noise assumption. This insight explains why the Gaussian distribution is not only a modeling assumption, but also underpins the optimization criteria used in regression algorithms.

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2