Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Gaussian Likelihood Functions and Log-Likelihood in Regression | Gaussian Distribution
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Probability Distributions for Machine Learning

bookGaussian Likelihood Functions and Log-Likelihood in Regression

The Gaussian (or normal) distribution is central to many machine learning models, especially in regression. In regression, you often assume that the observed data points are generated from a process where the true value is given by a function (such as a linear model), but each observation is corrupted by Gaussian noise. This leads to the Gaussian likelihood, which describes the probability of observing your data given the model's predictions and a noise parameter (variance).

Mathematically, for each data point yiy_i with prediction μiμ_i and noise standard deviation σσ, the likelihood is:

P(yiμi,σ)=(1/(2πσ))exp((yiμi)2/(2σ2))P(y_i | μ_i, σ) = (1 / (\sqrt{2π} σ)) * \exp(- (y_i - μ_i)^2 / (2σ^2))

For a dataset, the likelihood is the product of these probabilities across all points. In practice, it's easier to work with the log-likelihood, which turns the product into a sum and simplifies optimization. The log-likelihood for a set of data points is:

logL=N/2log(2π)Nlog(σ)(1/(2σ2))Σ(yiμi)2\log L = -N/2 * \log(2π) - N * \log(σ) - (1/(2σ^2)) * \Sigma (y_i - μ_i)^2

where NN is the number of data points.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np import matplotlib.pyplot as plt # Generate synthetic data np.random.seed(0) N = 50 true_mu = 2.0 true_sigma = 1.0 y = np.random.normal(loc=true_mu, scale=true_sigma, size=N) # Define range of mu and sigma to evaluate mu_values = np.linspace(0, 4, 100) sigma_values = np.linspace(0.5, 2.0, 100) # Compute log-likelihood for different mu (fix sigma) log_likelihood_mu = [ -N/2 * np.log(2 * np.pi) - N * np.log(true_sigma) - np.sum((y - mu)**2) / (2 * true_sigma**2) for mu in mu_values ] # Compute log-likelihood for different sigma (fix mu) log_likelihood_sigma = [ -N/2 * np.log(2 * np.pi) - N * np.log(sigma) - np.sum((y - true_mu)**2) / (2 * sigma**2) for sigma in sigma_values ] plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(mu_values, log_likelihood_mu) plt.xlabel("mu") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs mu (sigma fixed)") plt.subplot(1, 2, 2) plt.plot(sigma_values, log_likelihood_sigma) plt.xlabel("sigma") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs sigma (mu fixed)") plt.tight_layout() plt.show()
copy

The connection between the Gaussian log-likelihood and regression loss functions is fundamental in machine learning. When you maximize the log-likelihood with respect to the model parameters (such as the weights in linear regression), you are effectively minimizing the sum of squared residuals—the same objective as minimizing mean squared error (MSE). This is because the negative log-likelihood (up to a constant and scaling factor) is equivalent to the MSE loss function. Therefore, fitting a regression model by least squares is mathematically the same as maximizing the likelihood under a Gaussian noise assumption. This insight explains why the Gaussian distribution is not only a modeling assumption, but also underpins the optimization criteria used in regression algorithms.

question mark

Which statement best describes the Gaussian likelihood in regression?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 2

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

bookGaussian Likelihood Functions and Log-Likelihood in Regression

Pyyhkäise näyttääksesi valikon

The Gaussian (or normal) distribution is central to many machine learning models, especially in regression. In regression, you often assume that the observed data points are generated from a process where the true value is given by a function (such as a linear model), but each observation is corrupted by Gaussian noise. This leads to the Gaussian likelihood, which describes the probability of observing your data given the model's predictions and a noise parameter (variance).

Mathematically, for each data point yiy_i with prediction μiμ_i and noise standard deviation σσ, the likelihood is:

P(yiμi,σ)=(1/(2πσ))exp((yiμi)2/(2σ2))P(y_i | μ_i, σ) = (1 / (\sqrt{2π} σ)) * \exp(- (y_i - μ_i)^2 / (2σ^2))

For a dataset, the likelihood is the product of these probabilities across all points. In practice, it's easier to work with the log-likelihood, which turns the product into a sum and simplifies optimization. The log-likelihood for a set of data points is:

logL=N/2log(2π)Nlog(σ)(1/(2σ2))Σ(yiμi)2\log L = -N/2 * \log(2π) - N * \log(σ) - (1/(2σ^2)) * \Sigma (y_i - μ_i)^2

where NN is the number of data points.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as np import matplotlib.pyplot as plt # Generate synthetic data np.random.seed(0) N = 50 true_mu = 2.0 true_sigma = 1.0 y = np.random.normal(loc=true_mu, scale=true_sigma, size=N) # Define range of mu and sigma to evaluate mu_values = np.linspace(0, 4, 100) sigma_values = np.linspace(0.5, 2.0, 100) # Compute log-likelihood for different mu (fix sigma) log_likelihood_mu = [ -N/2 * np.log(2 * np.pi) - N * np.log(true_sigma) - np.sum((y - mu)**2) / (2 * true_sigma**2) for mu in mu_values ] # Compute log-likelihood for different sigma (fix mu) log_likelihood_sigma = [ -N/2 * np.log(2 * np.pi) - N * np.log(sigma) - np.sum((y - true_mu)**2) / (2 * sigma**2) for sigma in sigma_values ] plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(mu_values, log_likelihood_mu) plt.xlabel("mu") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs mu (sigma fixed)") plt.subplot(1, 2, 2) plt.plot(sigma_values, log_likelihood_sigma) plt.xlabel("sigma") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs sigma (mu fixed)") plt.tight_layout() plt.show()
copy

The connection between the Gaussian log-likelihood and regression loss functions is fundamental in machine learning. When you maximize the log-likelihood with respect to the model parameters (such as the weights in linear regression), you are effectively minimizing the sum of squared residuals—the same objective as minimizing mean squared error (MSE). This is because the negative log-likelihood (up to a constant and scaling factor) is equivalent to the MSE loss function. Therefore, fitting a regression model by least squares is mathematically the same as maximizing the likelihood under a Gaussian noise assumption. This insight explains why the Gaussian distribution is not only a modeling assumption, but also underpins the optimization criteria used in regression algorithms.

question mark

Which statement best describes the Gaussian likelihood in regression?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 2
some-alt