Gaussian Likelihood Functions and Log-Likelihood in Regression
The Gaussian (or normal) distribution is central to many machine learning models, especially in regression. In regression, you often assume that the observed data points are generated from a process where the true value is given by a function (such as a linear model), but each observation is corrupted by Gaussian noise. This leads to the Gaussian likelihood, which describes the probability of observing your data given the model's predictions and a noise parameter (variance).
Mathematically, for each data point yiβ with prediction ΞΌiβ and noise standard deviation Ο, the likelihood is:
P(yiββ£ΞΌiβ,Ο)=(1/(2ΟβΟ))βexp(β(yiββΞΌiβ)2/(2Ο2))For a dataset, the likelihood is the product of these probabilities across all points. In practice, it's easier to work with the log-likelihood, which turns the product into a sum and simplifies optimization. The log-likelihood for a set of data points is:
logL=βN/2βlog(2Ο)βNβlog(Ο)β(1/(2Ο2))βΞ£(yiββΞΌiβ)2where N is the number of data points.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546import numpy as np import matplotlib.pyplot as plt # Generate synthetic data np.random.seed(0) N = 50 true_mu = 2.0 true_sigma = 1.0 y = np.random.normal(loc=true_mu, scale=true_sigma, size=N) # Define range of mu and sigma to evaluate mu_values = np.linspace(0, 4, 100) sigma_values = np.linspace(0.5, 2.0, 100) # Compute log-likelihood for different mu (fix sigma) log_likelihood_mu = [ -N/2 * np.log(2 * np.pi) - N * np.log(true_sigma) - np.sum((y - mu)**2) / (2 * true_sigma**2) for mu in mu_values ] # Compute log-likelihood for different sigma (fix mu) log_likelihood_sigma = [ -N/2 * np.log(2 * np.pi) - N * np.log(sigma) - np.sum((y - true_mu)**2) / (2 * sigma**2) for sigma in sigma_values ] plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(mu_values, log_likelihood_mu) plt.xlabel("mu") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs mu (sigma fixed)") plt.subplot(1, 2, 2) plt.plot(sigma_values, log_likelihood_sigma) plt.xlabel("sigma") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs sigma (mu fixed)") plt.tight_layout() plt.show()
The connection between the Gaussian log-likelihood and regression loss functions is fundamental in machine learning. When you maximize the log-likelihood with respect to the model parameters (such as the weights in linear regression), you are effectively minimizing the sum of squared residualsβthe same objective as minimizing mean squared error (MSE). This is because the negative log-likelihood (up to a constant and scaling factor) is equivalent to the MSE loss function. Therefore, fitting a regression model by least squares is mathematically the same as maximizing the likelihood under a Gaussian noise assumption. This insight explains why the Gaussian distribution is not only a modeling assumption, but also underpins the optimization criteria used in regression algorithms.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 6.67
Gaussian Likelihood Functions and Log-Likelihood in Regression
Swipe to show menu
The Gaussian (or normal) distribution is central to many machine learning models, especially in regression. In regression, you often assume that the observed data points are generated from a process where the true value is given by a function (such as a linear model), but each observation is corrupted by Gaussian noise. This leads to the Gaussian likelihood, which describes the probability of observing your data given the model's predictions and a noise parameter (variance).
Mathematically, for each data point yiβ with prediction ΞΌiβ and noise standard deviation Ο, the likelihood is:
P(yiββ£ΞΌiβ,Ο)=(1/(2ΟβΟ))βexp(β(yiββΞΌiβ)2/(2Ο2))For a dataset, the likelihood is the product of these probabilities across all points. In practice, it's easier to work with the log-likelihood, which turns the product into a sum and simplifies optimization. The log-likelihood for a set of data points is:
logL=βN/2βlog(2Ο)βNβlog(Ο)β(1/(2Ο2))βΞ£(yiββΞΌiβ)2where N is the number of data points.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546import numpy as np import matplotlib.pyplot as plt # Generate synthetic data np.random.seed(0) N = 50 true_mu = 2.0 true_sigma = 1.0 y = np.random.normal(loc=true_mu, scale=true_sigma, size=N) # Define range of mu and sigma to evaluate mu_values = np.linspace(0, 4, 100) sigma_values = np.linspace(0.5, 2.0, 100) # Compute log-likelihood for different mu (fix sigma) log_likelihood_mu = [ -N/2 * np.log(2 * np.pi) - N * np.log(true_sigma) - np.sum((y - mu)**2) / (2 * true_sigma**2) for mu in mu_values ] # Compute log-likelihood for different sigma (fix mu) log_likelihood_sigma = [ -N/2 * np.log(2 * np.pi) - N * np.log(sigma) - np.sum((y - true_mu)**2) / (2 * sigma**2) for sigma in sigma_values ] plt.figure(figsize=(12, 5)) plt.subplot(1, 2, 1) plt.plot(mu_values, log_likelihood_mu) plt.xlabel("mu") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs mu (sigma fixed)") plt.subplot(1, 2, 2) plt.plot(sigma_values, log_likelihood_sigma) plt.xlabel("sigma") plt.ylabel("Log-Likelihood") plt.title("Log-Likelihood vs sigma (mu fixed)") plt.tight_layout() plt.show()
The connection between the Gaussian log-likelihood and regression loss functions is fundamental in machine learning. When you maximize the log-likelihood with respect to the model parameters (such as the weights in linear regression), you are effectively minimizing the sum of squared residualsβthe same objective as minimizing mean squared error (MSE). This is because the negative log-likelihood (up to a constant and scaling factor) is equivalent to the MSE loss function. Therefore, fitting a regression model by least squares is mathematically the same as maximizing the likelihood under a Gaussian noise assumption. This insight explains why the Gaussian distribution is not only a modeling assumption, but also underpins the optimization criteria used in regression algorithms.
Thanks for your feedback!