Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Log Loss (Binary Cross-Entropy): Probabilistic Foundations | Classification Loss Functions
Quizzes & Challenges
Quizzes
Challenges
/
Loss Functions in Machine Learning

bookLog Loss (Binary Cross-Entropy): Probabilistic Foundations

You are about to encounter one of the most fundamental loss functions in binary classification: log loss, also known as binary cross-entropy. Its mathematical definition is as follows:

Llog(y,p^)=[ylogp^+(1y)log(1p^)]L_{log}(y, \hat{p}) = -[y \log \hat{p} + (1-y) \log (1-\hat{p})]

Here, yy is the true label (0 or 1), and p^\hat{p} is the predicted probability that the label is 1. The log loss penalizes predictions according to how much they diverge from the true label, with a particular emphasis on probabilistic confidence.

123456789101112
import numpy as np import matplotlib.pyplot as plt p = np.linspace(0.001, 0.999, 400) logloss_1 = -np.log(p) plt.plot(p, logloss_1, label="Log Loss (y=1)") plt.title("Log Loss as a Function of Predicted Probability (y=1)") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
copy

This shows the classic shape:

  • As p^1\hat{p} \to 1 → loss → 0;
  • As p^0\hat{p} \to 0 → loss → ∞;
  • Overconfident mistakes are punished extremely hard.
Note
Note

Log loss measures the negative log-likelihood of the true label under the predicted probability. This means it evaluates how "surprised" you should be, given your model's predicted probability and the actual outcome.

1234567891011
p = np.linspace(0.001, 0.999, 400) logloss_y1 = -np.log(p) logloss_y0 = -np.log(1 - p) plt.plot(p, logloss_y1, label="y = 1") plt.plot(p, logloss_y0, label="y = 0") plt.title("Binary Cross-Entropy for Both Classes") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
copy

This makes the symmetry clear:

  • If true label is 1, low loss when pp is high;
  • If true label is 0, low loss when pp is low.

The probabilistic foundation of log loss is rooted in maximum likelihood estimation. When you predict a probability p^\hat{p} for the label being 1, the log loss quantifies how well your prediction matches the observed outcome. If your predicted probability aligns perfectly with the true conditional probability of the label given the features, you minimize the expected log loss. This is why log loss naturally arises when fitting probabilistic classifiers: minimizing log loss is equivalent to maximizing the likelihood of the observed data under your model. Confident and correct predictions yield low log loss, while confident but incorrect predictions are heavily penalized. Uncertain predictions (where p^\hat{p} is near 0.5) result in moderate loss regardless of the true label.

12345678
import numpy as np y = 1 preds = np.array([0.95, 0.7, 0.5, 0.2, 0.01]) losses = - (y*np.log(preds) + (1-y)*np.log(1-preds)) for p, l in zip(preds, losses): print(f"p={p:.2f} → log loss={l:.3f}")
copy

Shows exactly:

  • High p → small penalty;
  • Uncertain → moderate;
  • Confident wrong → massive penalty.
question mark

Which statement best describes the probabilistic meaning of log loss in binary classification?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 3. Kapitel 1

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Awesome!

Completion rate improved to 6.67

bookLog Loss (Binary Cross-Entropy): Probabilistic Foundations

Svep för att visa menyn

You are about to encounter one of the most fundamental loss functions in binary classification: log loss, also known as binary cross-entropy. Its mathematical definition is as follows:

Llog(y,p^)=[ylogp^+(1y)log(1p^)]L_{log}(y, \hat{p}) = -[y \log \hat{p} + (1-y) \log (1-\hat{p})]

Here, yy is the true label (0 or 1), and p^\hat{p} is the predicted probability that the label is 1. The log loss penalizes predictions according to how much they diverge from the true label, with a particular emphasis on probabilistic confidence.

123456789101112
import numpy as np import matplotlib.pyplot as plt p = np.linspace(0.001, 0.999, 400) logloss_1 = -np.log(p) plt.plot(p, logloss_1, label="Log Loss (y=1)") plt.title("Log Loss as a Function of Predicted Probability (y=1)") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
copy

This shows the classic shape:

  • As p^1\hat{p} \to 1 → loss → 0;
  • As p^0\hat{p} \to 0 → loss → ∞;
  • Overconfident mistakes are punished extremely hard.
Note
Note

Log loss measures the negative log-likelihood of the true label under the predicted probability. This means it evaluates how "surprised" you should be, given your model's predicted probability and the actual outcome.

1234567891011
p = np.linspace(0.001, 0.999, 400) logloss_y1 = -np.log(p) logloss_y0 = -np.log(1 - p) plt.plot(p, logloss_y1, label="y = 1") plt.plot(p, logloss_y0, label="y = 0") plt.title("Binary Cross-Entropy for Both Classes") plt.xlabel("Predicted Probability p") plt.ylabel("Loss") plt.legend() plt.show()
copy

This makes the symmetry clear:

  • If true label is 1, low loss when pp is high;
  • If true label is 0, low loss when pp is low.

The probabilistic foundation of log loss is rooted in maximum likelihood estimation. When you predict a probability p^\hat{p} for the label being 1, the log loss quantifies how well your prediction matches the observed outcome. If your predicted probability aligns perfectly with the true conditional probability of the label given the features, you minimize the expected log loss. This is why log loss naturally arises when fitting probabilistic classifiers: minimizing log loss is equivalent to maximizing the likelihood of the observed data under your model. Confident and correct predictions yield low log loss, while confident but incorrect predictions are heavily penalized. Uncertain predictions (where p^\hat{p} is near 0.5) result in moderate loss regardless of the true label.

12345678
import numpy as np y = 1 preds = np.array([0.95, 0.7, 0.5, 0.2, 0.01]) losses = - (y*np.log(preds) + (1-y)*np.log(1-preds)) for p, l in zip(preds, losses): print(f"p={p:.2f} → log loss={l:.3f}")
copy

Shows exactly:

  • High p → small penalty;
  • Uncertain → moderate;
  • Confident wrong → massive penalty.
question mark

Which statement best describes the probabilistic meaning of log loss in binary classification?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 3. Kapitel 1
some-alt