Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Bernoulli Likelihood and the Cross-Entropy Loss Function | Bernoulli and Multinomial Distributions
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Probability Distributions for Machine Learning

bookBernoulli Likelihood and the Cross-Entropy Loss Function

The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as pp. In the context of machine learning, binary classification tasks often assume that each label yy (where yy is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability pp assigned by a model for the positive class. The likelihood of observing a true label yy given a predicted probability pp is written as:

L(p;y)=py(1p)1yL(p; y) = p^y (1-p)^{1-y}

This likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:

logL(p;y)=ylog(p)+(1y)log(1p)\log L(p; y) = y \log(p) + (1-y) \log(1-p)

This log-likelihood function is fundamental to binary classifiers such as logisticregressionlogistic regression. The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:

Cross-entropy=1Ni=1N[yilog(pi)+(1yi)log(1pi)]\text{Cross-entropy} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p_i) + (1-y_i)\log(1-p_i)]

This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.

12345678910111213141516171819202122232425
import numpy as np import matplotlib.pyplot as plt # Range of predicted probabilities p = np.linspace(0.001, 0.999, 200) # True label y = 1 log_likelihood_y1 = np.log(p) cross_entropy_y1 = -np.log(p) # True label y = 0 log_likelihood_y0 = np.log(1 - p) cross_entropy_y0 = -np.log(1 - p) plt.figure(figsize=(10, 6)) plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue') plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5) plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red') plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5) plt.xlabel('Predicted Probability $p$') plt.ylabel('Value') plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss') plt.legend() plt.grid(True) plt.show()
copy

The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability pp varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.

question mark

Which statement about the Bernoulli distribution is true?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 2

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain why cross-entropy is preferred over other loss functions for binary classification?

How does the log-likelihood relate to model calibration in practice?

Can you provide an example of how these concepts are used in logistic regression training?

bookBernoulli Likelihood and the Cross-Entropy Loss Function

Deslize para mostrar o menu

The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as pp. In the context of machine learning, binary classification tasks often assume that each label yy (where yy is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability pp assigned by a model for the positive class. The likelihood of observing a true label yy given a predicted probability pp is written as:

L(p;y)=py(1p)1yL(p; y) = p^y (1-p)^{1-y}

This likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:

logL(p;y)=ylog(p)+(1y)log(1p)\log L(p; y) = y \log(p) + (1-y) \log(1-p)

This log-likelihood function is fundamental to binary classifiers such as logisticregressionlogistic regression. The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:

Cross-entropy=1Ni=1N[yilog(pi)+(1yi)log(1pi)]\text{Cross-entropy} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p_i) + (1-y_i)\log(1-p_i)]

This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.

12345678910111213141516171819202122232425
import numpy as np import matplotlib.pyplot as plt # Range of predicted probabilities p = np.linspace(0.001, 0.999, 200) # True label y = 1 log_likelihood_y1 = np.log(p) cross_entropy_y1 = -np.log(p) # True label y = 0 log_likelihood_y0 = np.log(1 - p) cross_entropy_y0 = -np.log(1 - p) plt.figure(figsize=(10, 6)) plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue') plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5) plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red') plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5) plt.xlabel('Predicted Probability $p$') plt.ylabel('Value') plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss') plt.legend() plt.grid(True) plt.show()
copy

The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability pp varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.

question mark

Which statement about the Bernoulli distribution is true?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 2
some-alt