Log Loss (Binary Cross-Entropy): Probabilistic Foundations
You are about to encounter one of the most fundamental loss functions in binary classification: log loss, also known as binary cross-entropy. Its mathematical definition is as follows:
Llog(y,p^)=−[ylogp^+(1−y)log(1−p^)]Here, y is the true label (0 or 1), and p^ is the predicted probability that the label is 1. The log loss penalizes predictions according to how much they diverge from the true label, with a particular emphasis on probabilistic confidence.
123456789101112import numpy as np import matplotlib.pyplot as plt p = np.linspace(0.001, 0.999, 400) logloss_1 = -np.log(p) plt.plot(p, logloss_1, label="Log Loss (y=1)") plt.title("Log Loss as a Function of Predicted Probability (y=1)") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
This shows the classic shape:
- As p^→1 → loss → 0;
- As p^→0 → loss → ∞;
- Overconfident mistakes are punished extremely hard.
Log loss measures the negative log-likelihood of the true label under the predicted probability. This means it evaluates how "surprised" you should be, given your model's predicted probability and the actual outcome.
1234567891011p = np.linspace(0.001, 0.999, 400) logloss_y1 = -np.log(p) logloss_y0 = -np.log(1 - p) plt.plot(p, logloss_y1, label="y = 1") plt.plot(p, logloss_y0, label="y = 0") plt.title("Binary Cross-Entropy for Both Classes") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
This makes the symmetry clear:
- If true label is 1, low loss when p is high;
- If true label is 0, low loss when p is low.
The probabilistic foundation of log loss is rooted in maximum likelihood estimation. When you predict a probability p^ for the label being 1, the log loss quantifies how well your prediction matches the observed outcome. If your predicted probability aligns perfectly with the true conditional probability of the label given the features, you minimize the expected log loss. This is why log loss naturally arises when fitting probabilistic classifiers: minimizing log loss is equivalent to maximizing the likelihood of the observed data under your model. Confident and correct predictions yield low log loss, while confident but incorrect predictions are heavily penalized. Uncertain predictions (where p^ is near 0.5) result in moderate loss regardless of the true label.
12345678import numpy as np y = 1 preds = np.array([0.95, 0.7, 0.5, 0.2, 0.01]) losses = - (y*np.log(preds) + (1-y)*np.log(1-preds)) for p, l in zip(preds, losses): print(f"p={p:.2f} → log loss={l:.3f}")
Shows exactly:
- High p → small penalty;
- Uncertain → moderate;
- Confident wrong → massive penalty.
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat
Awesome!
Completion rate improved to 6.67
Log Loss (Binary Cross-Entropy): Probabilistic Foundations
Stryg for at vise menuen
You are about to encounter one of the most fundamental loss functions in binary classification: log loss, also known as binary cross-entropy. Its mathematical definition is as follows:
Llog(y,p^)=−[ylogp^+(1−y)log(1−p^)]Here, y is the true label (0 or 1), and p^ is the predicted probability that the label is 1. The log loss penalizes predictions according to how much they diverge from the true label, with a particular emphasis on probabilistic confidence.
123456789101112import numpy as np import matplotlib.pyplot as plt p = np.linspace(0.001, 0.999, 400) logloss_1 = -np.log(p) plt.plot(p, logloss_1, label="Log Loss (y=1)") plt.title("Log Loss as a Function of Predicted Probability (y=1)") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
This shows the classic shape:
- As p^→1 → loss → 0;
- As p^→0 → loss → ∞;
- Overconfident mistakes are punished extremely hard.
Log loss measures the negative log-likelihood of the true label under the predicted probability. This means it evaluates how "surprised" you should be, given your model's predicted probability and the actual outcome.
1234567891011p = np.linspace(0.001, 0.999, 400) logloss_y1 = -np.log(p) logloss_y0 = -np.log(1 - p) plt.plot(p, logloss_y1, label="y = 1") plt.plot(p, logloss_y0, label="y = 0") plt.title("Binary Cross-Entropy for Both Classes") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
This makes the symmetry clear:
- If true label is 1, low loss when p is high;
- If true label is 0, low loss when p is low.
The probabilistic foundation of log loss is rooted in maximum likelihood estimation. When you predict a probability p^ for the label being 1, the log loss quantifies how well your prediction matches the observed outcome. If your predicted probability aligns perfectly with the true conditional probability of the label given the features, you minimize the expected log loss. This is why log loss naturally arises when fitting probabilistic classifiers: minimizing log loss is equivalent to maximizing the likelihood of the observed data under your model. Confident and correct predictions yield low log loss, while confident but incorrect predictions are heavily penalized. Uncertain predictions (where p^ is near 0.5) result in moderate loss regardless of the true label.
12345678import numpy as np y = 1 preds = np.array([0.95, 0.7, 0.5, 0.2, 0.01]) losses = - (y*np.log(preds) + (1-y)*np.log(1-preds)) for p, l in zip(preds, losses): print(f"p={p:.2f} → log loss={l:.3f}")
Shows exactly:
- High p → small penalty;
- Uncertain → moderate;
- Confident wrong → massive penalty.
Tak for dine kommentarer!