Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Bernoulli Likelihood and the Cross-Entropy Loss Function | Bernoulli and Multinomial Distributions
Probability Distributions for Machine Learning

bookBernoulli Likelihood and the Cross-Entropy Loss Function

The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as pp. In the context of machine learning, binary classification tasks often assume that each label yy (where yy is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability pp assigned by a model for the positive class. The likelihood of observing a true label yy given a predicted probability pp is written as:

L(p;y)=py(1p)1yL(p; y) = p^y (1-p)^{1-y}

This likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:

logL(p;y)=ylog(p)+(1y)log(1p)\log L(p; y) = y \log(p) + (1-y) \log(1-p)

This log-likelihood function is fundamental to binary classifiers such as logisticregressionlogistic regression. The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:

Cross-entropy=1Ni=1N[yilog(pi)+(1yi)log(1pi)]\text{Cross-entropy} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p_i) + (1-y_i)\log(1-p_i)]

This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.

12345678910111213141516171819202122232425
import numpy as np import matplotlib.pyplot as plt # Range of predicted probabilities p = np.linspace(0.001, 0.999, 200) # True label y = 1 log_likelihood_y1 = np.log(p) cross_entropy_y1 = -np.log(p) # True label y = 0 log_likelihood_y0 = np.log(1 - p) cross_entropy_y0 = -np.log(1 - p) plt.figure(figsize=(10, 6)) plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue') plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5) plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red') plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5) plt.xlabel('Predicted Probability $p$') plt.ylabel('Value') plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss') plt.legend() plt.grid(True) plt.show()
copy

The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability pp varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.

question mark

Which statement about the Bernoulli distribution is true?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 2

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain why cross-entropy is preferred over other loss functions for binary classification?

How does the log-likelihood relate to model calibration in practice?

Can you provide an example of how these concepts are used in logistic regression training?

bookBernoulli Likelihood and the Cross-Entropy Loss Function

Pyyhkäise näyttääksesi valikon

The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as pp. In the context of machine learning, binary classification tasks often assume that each label yy (where yy is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability pp assigned by a model for the positive class. The likelihood of observing a true label yy given a predicted probability pp is written as:

L(p;y)=py(1p)1yL(p; y) = p^y (1-p)^{1-y}

This likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:

logL(p;y)=ylog(p)+(1y)log(1p)\log L(p; y) = y \log(p) + (1-y) \log(1-p)

This log-likelihood function is fundamental to binary classifiers such as logisticregressionlogistic regression. The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:

Cross-entropy=1Ni=1N[yilog(pi)+(1yi)log(1pi)]\text{Cross-entropy} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p_i) + (1-y_i)\log(1-p_i)]

This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.

12345678910111213141516171819202122232425
import numpy as np import matplotlib.pyplot as plt # Range of predicted probabilities p = np.linspace(0.001, 0.999, 200) # True label y = 1 log_likelihood_y1 = np.log(p) cross_entropy_y1 = -np.log(p) # True label y = 0 log_likelihood_y0 = np.log(1 - p) cross_entropy_y0 = -np.log(1 - p) plt.figure(figsize=(10, 6)) plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue') plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5) plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red') plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5) plt.xlabel('Predicted Probability $p$') plt.ylabel('Value') plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss') plt.legend() plt.grid(True) plt.show()
copy

The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability pp varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.

question mark

Which statement about the Bernoulli distribution is true?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 2
some-alt