Aprenda Bernoulli Likelihood and the Cross-Entropy Loss Function

The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as $p$ . In the context of machine learning, binary classification tasks often assume that each label $y$ (where $y$ is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability $p$ assigned by a model for the positive class. The likelihood of observing a true label $y$ given a predicted probability $p$ is written as:

L(p; y) = p^y (1-p)^{1-y}

This likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:

\log L(p; y) = y \log(p) + (1-y) \log(1-p)

This log-likelihood function is fundamental to binary classifiers such as $logistic regression$ . The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:

\text{Cross-entropy} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p_i) + (1-y_i)\log(1-p_i)]

This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.


              12345678910111213141516171819202122232425
            
import numpy as np
import matplotlib.pyplot as plt

# Range of predicted probabilities
p = np.linspace(0.001, 0.999, 200)

# True label y = 1
log_likelihood_y1 = np.log(p)
cross_entropy_y1 = -np.log(p)

# True label y = 0
log_likelihood_y0 = np.log(1 - p)
cross_entropy_y0 = -np.log(1 - p)

plt.figure(figsize=(10, 6))
plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue')
plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5)
plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red')
plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5)
plt.xlabel('Predicted Probability $p$')
plt.ylabel('Value')
plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss')
plt.legend()
plt.grid(True)
plt.show()

The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability $p$ varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 3. Capítulo 2

Pergunte à IA

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain why cross-entropy is preferred over other loss functions for binary classification?

How does the log-likelihood relate to model calibration in practice?

Can you provide an example of how these concepts are used in logistic regression training?

Deslize para mostrar o menu

L(p; y) = p^y (1-p)^{1-y}

\log L(p; y) = y \log(p) + (1-y) \log(1-p)

\text{Cross-entropy} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p_i) + (1-y_i)\log(1-p_i)]

This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.


              12345678910111213141516171819202122232425
            
import numpy as np
import matplotlib.pyplot as plt

# Range of predicted probabilities
p = np.linspace(0.001, 0.999, 200)

# True label y = 1
log_likelihood_y1 = np.log(p)
cross_entropy_y1 = -np.log(p)

# True label y = 0
log_likelihood_y0 = np.log(1 - p)
cross_entropy_y0 = -np.log(1 - p)

plt.figure(figsize=(10, 6))
plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue')
plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5)
plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red')
plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5)
plt.xlabel('Predicted Probability $p$')
plt.ylabel('Value')
plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss')
plt.legend()
plt.grid(True)
plt.show()

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 3. Capítulo 2