Multi-class Cross-Entropy and the Softmax Connection
The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:
LCEβ(y,p^β)=βkββykβlogp^βkβwhere ykβ is the true distribution for class k (typically 1 for the correct class and 0 otherwise), and p^βkβ is the predicted probability for class k, usually produced by applying the softmax function to the model's raw outputs.
1234567import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} β CE loss = {l:.3f}")
A simple numeric demo showing:
- High confidence & correct β small loss;
- Moderate confidence β moderate loss;
- Confident but wrong (p very small) β huge loss.
Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.
The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^βkβ are between 0 and 1 and sum to 1. This is defined as:
p^βkβ=βjβexp(zjβ)exp(zkβ)βwhere zkβ is the raw score for class k. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.
12345678import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 6.67
Multi-class Cross-Entropy and the Softmax Connection
Swipe to show menu
The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:
LCEβ(y,p^β)=βkββykβlogp^βkβwhere ykβ is the true distribution for class k (typically 1 for the correct class and 0 otherwise), and p^βkβ is the predicted probability for class k, usually produced by applying the softmax function to the model's raw outputs.
1234567import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} β CE loss = {l:.3f}")
A simple numeric demo showing:
- High confidence & correct β small loss;
- Moderate confidence β moderate loss;
- Confident but wrong (p very small) β huge loss.
Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.
The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^βkβ are between 0 and 1 and sum to 1. This is defined as:
p^βkβ=βjβexp(zjβ)exp(zkβ)βwhere zkβ is the raw score for class k. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.
12345678import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.
Thanks for your feedback!