Multi-class Cross-Entropy and the Softmax Connection
The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:
LCE(y,p^)=−k∑yklogp^kwhere yk is the true distribution for class k (typically 1 for the correct class and 0 otherwise), and p^k is the predicted probability for class k, usually produced by applying the softmax function to the model's raw outputs.
1234567import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} → CE loss = {l:.3f}")
A simple numeric demo showing:
- High confidence & correct → small loss;
- Moderate confidence → moderate loss;
- Confident but wrong (p very small) → huge loss.
Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.
The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^k are between 0 and 1 and sum to 1. This is defined as:
p^k=∑jexp(zj)exp(zk)where zk is the raw score for class k. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.
12345678import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Awesome!
Completion rate improved to 6.67
Multi-class Cross-Entropy and the Softmax Connection
Scorri per mostrare il menu
The multi-class cross-entropy loss is a fundamental tool for training classifiers when there are more than two possible classes. Its formula is:
LCE(y,p^)=−k∑yklogp^kwhere yk is the true distribution for class k (typically 1 for the correct class and 0 otherwise), and p^k is the predicted probability for class k, usually produced by applying the softmax function to the model's raw outputs.
1234567import numpy as np correct_probs = np.array([0.9, 0.6, 0.33, 0.1]) loss = -np.log(correct_probs) for p, l in zip(correct_probs, loss): print(f"Predicted probability for true class = {p:.2f} → CE loss = {l:.3f}")
A simple numeric demo showing:
- High confidence & correct → small loss;
- Moderate confidence → moderate loss;
- Confident but wrong (p very small) → huge loss.
Cross-entropy quantifies the difference between true and predicted class distributions. It measures how well the predicted probabilities match the actual class labels, assigning a higher loss when the model is confident but wrong.
The softmax transformation is critical in multi-class classification. It converts a vector of raw output scores (logits) from a model into a probability distribution over classes, ensuring that all predicted probabilities p^k are between 0 and 1 and sum to 1. This is defined as:
p^k=∑jexp(zj)exp(zk)where zk is the raw score for class k. Softmax and cross-entropy are paired because softmax outputs interpretable probabilities, and cross-entropy penalizes the model based on how far these probabilities are from the true class distribution. When the model assigns a high probability to the wrong class, the loss increases sharply, guiding the model to improve its predictions.
12345678import numpy as np logits = np.array([2.0, 1.0, 0.1]) exp_vals = np.exp(logits) softmax = exp_vals / np.sum(exp_vals) print("Logits:", logits) print("Softmax probabilities:", softmax)
Shows how a single large logit can dominate the distribution and how softmax normalizes everything into probabilities.
Grazie per i tuoi commenti!