Bernoulli Likelihood and the Cross-Entropy Loss Function
The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as p. In the context of machine learning, binary classification tasks often assume that each label y (where y is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability p assigned by a model for the positive class. The likelihood of observing a true label y given a predicted probability p is written as:
L(p;y)=py(1−p)1−yThis likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:
logL(p;y)=ylog(p)+(1−y)log(1−p)This log-likelihood function is fundamental to binary classifiers such as logisticregression. The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:
Cross-entropy=−N1i=1∑N[yilog(pi)+(1−yi)log(1−pi)]This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.
12345678910111213141516171819202122232425import numpy as np import matplotlib.pyplot as plt # Range of predicted probabilities p = np.linspace(0.001, 0.999, 200) # True label y = 1 log_likelihood_y1 = np.log(p) cross_entropy_y1 = -np.log(p) # True label y = 0 log_likelihood_y0 = np.log(1 - p) cross_entropy_y0 = -np.log(1 - p) plt.figure(figsize=(10, 6)) plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue') plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5) plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red') plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5) plt.xlabel('Predicted Probability $p$') plt.ylabel('Value') plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss') plt.legend() plt.grid(True) plt.show()
The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability p varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Can you explain why cross-entropy is preferred over other loss functions for binary classification?
How does the log-likelihood relate to model calibration in practice?
Can you provide an example of how these concepts are used in logistic regression training?
Incrível!
Completion taxa melhorada para 6.67
Bernoulli Likelihood and the Cross-Entropy Loss Function
Deslize para mostrar o menu
The Bernoulli distribution models binary outcomes — such as success/failure or yes/no—using a single parameter, the probability of success, often denoted as p. In the context of machine learning, binary classification tasks often assume that each label y (where y is 0 or 1) is drawn from a Bernoulli distribution, with a predicted probability p assigned by a model for the positive class. The likelihood of observing a true label y given a predicted probability p is written as:
L(p;y)=py(1−p)1−yThis likelihood reflects how probable the observed outcome is under the model's prediction. However, when training models, you typically maximize the log-likelihood, which for a single observation becomes:
logL(p;y)=ylog(p)+(1−y)log(1−p)This log-likelihood function is fundamental to binary classifiers such as logisticregression. The negative log-likelihood is widely known as the cross-entropy loss in machine learning literature. For a batch of data, the average negative log-likelihood (cross-entropy loss) is:
Cross-entropy=−N1i=1∑N[yilog(pi)+(1−yi)log(1−pi)]This loss function penalizes confident but wrong predictions much more heavily than less confident ones, making it a natural fit for probabilistic binary classification.
12345678910111213141516171819202122232425import numpy as np import matplotlib.pyplot as plt # Range of predicted probabilities p = np.linspace(0.001, 0.999, 200) # True label y = 1 log_likelihood_y1 = np.log(p) cross_entropy_y1 = -np.log(p) # True label y = 0 log_likelihood_y0 = np.log(1 - p) cross_entropy_y0 = -np.log(1 - p) plt.figure(figsize=(10, 6)) plt.plot(p, log_likelihood_y1, label='Log-Likelihood (y=1)', color='blue') plt.plot(p, -cross_entropy_y1, '--', label='-Cross-Entropy (y=1)', color='blue', alpha=0.5) plt.plot(p, log_likelihood_y0, label='Log-Likelihood (y=0)', color='red') plt.plot(p, -cross_entropy_y0, '--', label='-Cross-Entropy (y=0)', color='red', alpha=0.5) plt.xlabel('Predicted Probability $p$') plt.ylabel('Value') plt.title('Bernoulli Log-Likelihood and Cross-Entropy Loss') plt.legend() plt.grid(True) plt.show()
The plot above illustrates how the log-likelihood and cross-entropy loss behave as the predicted probability p varies, for both possible true labels. Notice that the log-likelihood reaches its maximum when the predicted probability matches the true label (either 0 or 1), and drops off rapidly as the prediction becomes less accurate. The cross-entropy loss, being the negative log-likelihood, is minimized when predictions are accurate and grows quickly for wrong, confident predictions. This property makes cross-entropy a natural loss function for Bernoulli models: it directly reflects the probability assigned to the true outcome and strongly discourages overconfident errors. As a result, optimizing cross-entropy encourages models to produce well-calibrated probabilities, which is essential for robust binary classification.
Obrigado pelo seu feedback!