Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Multi-class Cross-Entropy and the Softmax Connection | Classification Loss Functions
Loss Functions in Machine Learning

bookMulti-class Cross-Entropy and the Softmax Connection

The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:

LCE(y,p^)=kyklogp^kL_{CE}(y, \hat{p}) = -\sum_{k} y_k \log \hat{p}_k

where yky_k is the true distribution for class kk (typically 1 for the correct class and 0 otherwise), and p^k\hat{p}_k is the predicted probability for class kk, usually produced by applying the softmax function to the model's raw outputs.

1234567
import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} → CE loss = {l:.3f}")
copy

A simple numeric demo showing:

  • High confidence & correct → small loss;
  • Moderate confidence → moderate loss;
  • Confident but wrong (pp very small) → huge loss.
Note
Note

Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.

The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^k\hat{p}_k are between 0 and 1 and sum to 1. This is defined as:

p^k=exp(zk)jexp(zj)\hat{p}_k = \frac{\exp(z_k)}{\sum_{j} \exp(z_j)}

where zkz_k is the raw score for class kk. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.

12345678
import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
copy

Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.

question mark

Which statement best describes the role of softmax in multi-class classification and the way cross-entropy penalizes predictions?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 2

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Awesome!

Completion rate improved to 6.67

bookMulti-class Cross-Entropy and the Softmax Connection

Scorri per mostrare il menu

The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:

LCE(y,p^)=kyklogp^kL_{CE}(y, \hat{p}) = -\sum_{k} y_k \log \hat{p}_k

where yky_k is the true distribution for class kk (typically 1 for the correct class and 0 otherwise), and p^k\hat{p}_k is the predicted probability for class kk, usually produced by applying the softmax function to the model's raw outputs.

1234567
import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} → CE loss = {l:.3f}")
copy

A simple numeric demo showing:

  • High confidence & correct → small loss;
  • Moderate confidence → moderate loss;
  • Confident but wrong (pp very small) → huge loss.
Note
Note

Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.

The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^k\hat{p}_k are between 0 and 1 and sum to 1. This is defined as:

p^k=exp(zk)jexp(zj)\hat{p}_k = \frac{\exp(z_k)}{\sum_{j} \exp(z_j)}

where zkz_k is the raw score for class kk. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.

12345678
import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
copy

Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.

question mark

Which statement best describes the role of softmax in multi-class classification and the way cross-entropy penalizes predictions?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 2
some-alt