Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Multi-class Cross-Entropy and the Softmax Connection | Classification Loss Functions
Loss Functions in Machine Learning

bookMulti-class Cross-Entropy and the Softmax Connection

The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:

LCE(y,p^)=kyklogp^kL_{CE}(y, \hat{p}) = -\sum_{k} y_k \log \hat{p}_k

where yky_k is the true distribution for class kk (typically 1 for the correct class and 0 otherwise), and p^k\hat{p}_k is the predicted probability for class kk, usually produced by applying the softmax function to the model's raw outputs.

1234567
import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} → CE loss = {l:.3f}")
copy

A simple numeric demo showing:

  • High confidence & correct → small loss;
  • Moderate confidence → moderate loss;
  • Confident but wrong (pp very small) → huge loss.
Note
Note

Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.

The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^k\hat{p}_k are between 0 and 1 and sum to 1. This is defined as:

p^k=exp(zk)jexp(zj)\hat{p}_k = \frac{\exp(z_k)}{\sum_{j} \exp(z_j)}

where zkz_k is the raw score for class kk. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.

12345678
import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
copy

Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.

question mark

Which statement best describes the role of softmax in multi-class classification and the way cross-entropy penalizes predictions?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 2

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain why cross-entropy loss increases so much when the predicted probability is low for the true class?

How does the softmax function ensure that all probabilities sum to 1?

Can you give an example of how these concepts are used in training a neural network?

Awesome!

Completion rate improved to 6.67

bookMulti-class Cross-Entropy and the Softmax Connection

Свайпніть щоб показати меню

The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:

LCE(y,p^)=kyklogp^kL_{CE}(y, \hat{p}) = -\sum_{k} y_k \log \hat{p}_k

where yky_k is the true distribution for class kk (typically 1 for the correct class and 0 otherwise), and p^k\hat{p}_k is the predicted probability for class kk, usually produced by applying the softmax function to the model's raw outputs.

1234567
import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} → CE loss = {l:.3f}")
copy

A simple numeric demo showing:

  • High confidence & correct → small loss;
  • Moderate confidence → moderate loss;
  • Confident but wrong (pp very small) → huge loss.
Note
Note

Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.

The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^k\hat{p}_k are between 0 and 1 and sum to 1. This is defined as:

p^k=exp(zk)jexp(zj)\hat{p}_k = \frac{\exp(z_k)}{\sum_{j} \exp(z_j)}

where zkz_k is the raw score for class kk. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.

12345678
import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
copy

Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.

question mark

Which statement best describes the role of softmax in multi-class classification and the way cross-entropy penalizes predictions?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 2
some-alt