Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn The Multinomial Distribution | Bernoulli and Multinomial Distributions
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Probability Distributions for Machine Learning

bookThe Multinomial Distribution

When you face a machine learning problem with more than two possible outcomesβ€”such as classifying emails as "spam", "work", or "personal" β€” you need a probability model that can handle multiple categories. The Multinomial distribution is designed for just this scenario: it describes the probabilities of counts for each category in a fixed number of independent trials, where each trial results in exactly one of several possible outcomes. In a classification context, the Multinomial distribution models the likelihood of observing a particular combination of class assignments across multiple samples, making it fundamental to multi-class classification problems.

Note
Definition

The Multinomial probability mass function (PMF) gives the probability of observing a specific count for each category in a sequence of nn independent trials, where each trial can result in one of kk categories with probabilities p1,p2,...,pkp_1, p_2, ..., p_k (where each piβ‰₯0p_i \geq 0 and the sum of all pip_i equals 1):

P(X1=x1,...,Xk=xk)=n!x1!x2!β‹―xk!p1x1p2x2β‹―pkxkP(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k}

where:

  • nn: total number of trials;
  • kk: number of possible outcome categories;
  • xix_i: number of times category ii occurs (βˆ‘xi=n\sum x_i = n);
  • pip_i: probability of category ii in a single trial (βˆ‘pi=1\sum p_i = 1).
1234567891011
import numpy as np import matplotlib.pyplot as plt # Suppose you have a 3-class classification problem (e.g., "cat", "dog", "rabbit") categories = ["cat", "dog", "rabbit"] probabilities = [0.5, 0.3, 0.2] # Probability for each class n_samples = 100 # Number of trials (e.g., 100 data points) # Simulate one experiment: how many times does each class appear in 100 samples? counts = np.random.multinomial(n_samples, probabilities) print("Sampled counts:", dict(zip(categories, counts)))
copy
12345678910111213
# Repeat simulation to see variability n_experiments = 1000 all_counts = np.random.multinomial(n_samples, probabilities, size=n_experiments) # Plot histogram for each class fig, ax = plt.subplots() for idx, label in enumerate(categories): ax.hist(all_counts[:, idx], bins=30, alpha=0.6, label=label) ax.set_xlabel("Count in 100 samples") ax.set_ylabel("Frequency") ax.legend() plt.title("Distribution of class counts over 1000 experiments") plt.show()
copy

When you run this simulation, you see that the number of times each class appears in 100 samples varies from experiment to experiment, but the average matches the underlying probabilities. This mirrors what happens in a multi-class classifier, such as a softmax classifier, where the predicted probabilities for each class sum to one and represent the expected frequencies of each category. The Multinomial distribution provides the mathematical foundation for modeling these outcomes and for evaluating how well your classifier's predictions align with observed data.

question mark

Which of the following statements about the Multinomial distribution and its application in multi-class classification are correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookThe Multinomial Distribution

Swipe to show menu

When you face a machine learning problem with more than two possible outcomesβ€”such as classifying emails as "spam", "work", or "personal" β€” you need a probability model that can handle multiple categories. The Multinomial distribution is designed for just this scenario: it describes the probabilities of counts for each category in a fixed number of independent trials, where each trial results in exactly one of several possible outcomes. In a classification context, the Multinomial distribution models the likelihood of observing a particular combination of class assignments across multiple samples, making it fundamental to multi-class classification problems.

Note
Definition

The Multinomial probability mass function (PMF) gives the probability of observing a specific count for each category in a sequence of nn independent trials, where each trial can result in one of kk categories with probabilities p1,p2,...,pkp_1, p_2, ..., p_k (where each piβ‰₯0p_i \geq 0 and the sum of all pip_i equals 1):

P(X1=x1,...,Xk=xk)=n!x1!x2!β‹―xk!p1x1p2x2β‹―pkxkP(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k}

where:

  • nn: total number of trials;
  • kk: number of possible outcome categories;
  • xix_i: number of times category ii occurs (βˆ‘xi=n\sum x_i = n);
  • pip_i: probability of category ii in a single trial (βˆ‘pi=1\sum p_i = 1).
1234567891011
import numpy as np import matplotlib.pyplot as plt # Suppose you have a 3-class classification problem (e.g., "cat", "dog", "rabbit") categories = ["cat", "dog", "rabbit"] probabilities = [0.5, 0.3, 0.2] # Probability for each class n_samples = 100 # Number of trials (e.g., 100 data points) # Simulate one experiment: how many times does each class appear in 100 samples? counts = np.random.multinomial(n_samples, probabilities) print("Sampled counts:", dict(zip(categories, counts)))
copy
12345678910111213
# Repeat simulation to see variability n_experiments = 1000 all_counts = np.random.multinomial(n_samples, probabilities, size=n_experiments) # Plot histogram for each class fig, ax = plt.subplots() for idx, label in enumerate(categories): ax.hist(all_counts[:, idx], bins=30, alpha=0.6, label=label) ax.set_xlabel("Count in 100 samples") ax.set_ylabel("Frequency") ax.legend() plt.title("Distribution of class counts over 1000 experiments") plt.show()
copy

When you run this simulation, you see that the number of times each class appears in 100 samples varies from experiment to experiment, but the average matches the underlying probabilities. This mirrors what happens in a multi-class classifier, such as a softmax classifier, where the predicted probabilities for each class sum to one and represent the expected frequencies of each category. The Multinomial distribution provides the mathematical foundation for modeling these outcomes and for evaluating how well your classifier's predictions align with observed data.

question mark

Which of the following statements about the Multinomial distribution and its application in multi-class classification are correct?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3
some-alt