Impara Multinomial Likelihood and the Softmax Loss Function | Bernoulli and Multinomial Distributions

The Multinomial likelihood is central to modeling problems where each observation belongs to one of several discrete classes. In multi-class classification, you often encounter a scenario where, for a single observation, you observe one outcome among many possible categories. The Multinomial distribution generalizes the Bernoulli distribution to more than two classes, assigning a probability to each possible class label. The likelihood function for a single observation, given predicted class probabilities, is simply the probability assigned to the actual observed class. When you have a dataset of independent observations, the total likelihood is the product of the predicted probabilities for the observed classes across all samples. For computational and numerical reasons, you typically work with the log-likelihood, which sums the log-probabilities for the observed classes. This log-likelihood forms the basis for training many multi-class classifiers, including those using neural networks and logistic regression.


              1234567891011121314151617181920212223242526
            
import numpy as np
import matplotlib.pyplot as plt

# Suppose there are 3 classes: 0, 1, 2
num_classes = 3

# Simulate predicted probabilities for class 0, varying from 0.01 to 0.98
p0 = np.linspace(0.01, 0.98, 100)
# The rest of the probability mass is split equally between class 1 and 2
p_rest = (1 - p0) / 2
probs = np.vstack([p0, p_rest, p_rest]).T

# Assume the true class is 0 for this example
true_class = 0

# Compute log-likelihood for each set of predicted probabilities
log_likelihood = np.log(probs[:, true_class])

plt.figure(figsize=(7, 4))
plt.plot(p0, log_likelihood, label="Log-Likelihood (True class 0)")
plt.xlabel("Predicted Probability for True Class (class 0)")
plt.ylabel("Log-Likelihood")
plt.title("Log-Likelihood vs. Predicted Probability for True Class")
plt.grid(True)
plt.legend()
plt.show()

The plot above shows how the log-likelihood changes as you vary the predicted probability for the true class, holding the probabilities for the other classes equal. The log-likelihood is highest when the model assigns a probability near 1 to the true class and decreases rapidly as the probability drops. This relationship underpins the softmax loss, also known as the categorical cross-entropy loss. In multi-class classification, the softmax function is used to convert raw model outputs into normalized class probabilities. The loss function then compares these probabilities to the actual observed class using the negative log-likelihood. Minimizing this loss is equivalent to maximizing the Multinomial likelihood of the observed data. This connection is fundamental to modern machine learning algorithms for multi-class problems, ensuring that models are directly optimized to assign high probability to the correct class labels.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 4

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain how the softmax function works in this context?

What is the difference between log-likelihood and cross-entropy loss?

Can you give an example of how this is used in training a neural network?

Scorri per mostrare il menu


              1234567891011121314151617181920212223242526
            
import numpy as np
import matplotlib.pyplot as plt

# Suppose there are 3 classes: 0, 1, 2
num_classes = 3

# Simulate predicted probabilities for class 0, varying from 0.01 to 0.98
p0 = np.linspace(0.01, 0.98, 100)
# The rest of the probability mass is split equally between class 1 and 2
p_rest = (1 - p0) / 2
probs = np.vstack([p0, p_rest, p_rest]).T

# Assume the true class is 0 for this example
true_class = 0

# Compute log-likelihood for each set of predicted probabilities
log_likelihood = np.log(probs[:, true_class])

plt.figure(figsize=(7, 4))
plt.plot(p0, log_likelihood, label="Log-Likelihood (True class 0)")
plt.xlabel("Predicted Probability for True Class (class 0)")
plt.ylabel("Log-Likelihood")
plt.title("Log-Likelihood vs. Predicted Probability for True Class")
plt.grid(True)
plt.legend()
plt.show()

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 4