Learn The Multinomial Distribution | Bernoulli and Multinomial Distributions

Swipe to show menu

When you face a machine learning problem with more than two possible outcomes—such as classifying emails as "spam", "work", or "personal" — you need a probability model that can handle multiple categories. The Multinomial distribution is designed for just this scenario: it describes the probabilities of counts for each category in a fixed number of independent trials, where each trial results in exactly one of several possible outcomes. In a classification context, the Multinomial distribution models the likelihood of observing a particular combination of class assignments across multiple samples, making it fundamental to multi-class classification problems.

Definition

The Multinomial probability mass function (PMF) gives the probability of observing a specific count for each category in a sequence of $n$ independent trials, where each trial can result in one of $k$ categories with probabilities $p_1, p_2, ..., p_k$ (where each $p_i \geq 0$ and the sum of all $p_i$ equals 1):

P(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k}

where:

$n$ : total number of trials;
$k$ : number of possible outcome categories;
$x_i$ : number of times category $i$ occurs ( $\sum x_i = n$ );
$p_i$ : probability of category $i$ in a single trial ( $\sum p_i = 1$ ).


              1234567891011
            
import numpy as np
import matplotlib.pyplot as plt

# Suppose you have a 3-class classification problem (e.g., "cat", "dog", "rabbit")
categories = ["cat", "dog", "rabbit"]
probabilities = [0.5, 0.3, 0.2]  # Probability for each class
n_samples = 100  # Number of trials (e.g., 100 data points)

# Simulate one experiment: how many times does each class appear in 100 samples?
counts = np.random.multinomial(n_samples, probabilities)
print("Sampled counts:", dict(zip(categories, counts)))


              12345678910111213
            
# Repeat simulation to see variability
n_experiments = 1000
all_counts = np.random.multinomial(n_samples, probabilities, size=n_experiments)

# Plot histogram for each class
fig, ax = plt.subplots()
for idx, label in enumerate(categories):
    ax.hist(all_counts[:, idx], bins=30, alpha=0.6, label=label)
ax.set_xlabel("Count in 100 samples")
ax.set_ylabel("Frequency")
ax.legend()
plt.title("Distribution of class counts over 1000 experiments")
plt.show()

When you run this simulation, you see that the number of times each class appears in 100 samples varies from experiment to experiment, but the average matches the underlying probabilities. This mirrors what happens in a multi-class classifier, such as a softmax classifier, where the predicted probabilities for each class sum to one and represent the expected frequencies of each category. The Multinomial distribution provides the mathematical foundation for modeling these outcomes and for evaluating how well your classifier's predictions align with observed data.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 3