Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn The Bernoulli Distribution | Bernoulli and Multinomial Distributions
Probability Distributions for Machine Learning

bookThe Bernoulli Distribution

The Bernoulli distribution is one of the simplest yet most fundamental probability distributions in machine learning. It models binary eventsβ€”situations where there are only two possible outcomes, such as success/failure, yes/no, or 1/0. In the context of machine learning, the Bernoulli distribution is crucial for binary classification tasks, where you need to predict whether an instance belongs to one of two classes. For example, determining whether an email is spam or not spam, or whether a tumor is malignant or benign, can be naturally represented by a Bernoulli random variable.

A Bernoulli random variable takes the value 11 with probability pp (the probability of "success") and 00 with probability 1βˆ’p1 - p (the probability of "failure"). The probability mass function is given by:

P(X=x)=px(1βˆ’p)1βˆ’x,Β whereΒ x∈{0,1}P(X = x) = p^x (1-p)^{1-x}, \text{ where } x \in \{0,1\}

This means that if you know the probability of success, you can describe the entire distribution. The Bernoulli distribution forms the building block for more complex models, such as the binomial and multinomial distributions, and is at the heart of models like logistic regression.

123456789101112131415161718
import numpy as np import matplotlib.pyplot as plt # Define different probabilities for the Bernoulli distribution probabilities = [0.2, 0.5, 0.8] num_samples = 1000 fig, axes = plt.subplots(1, 3, figsize=(12, 4)) for ax, p in zip(axes, probabilities): samples = np.random.binomial(n=1, p=p, size=num_samples) counts = np.bincount(samples, minlength=2) ax.bar([0, 1], counts, tick_label=['0', '1'], color=['#1f77b4', '#ff7f0e']) ax.set_title(f'p = {p}') ax.set_xlabel('Outcome') ax.set_ylabel('Count') ax.set_ylim(0, num_samples) plt.tight_layout() plt.show()
copy

When you simulate samples from a Bernoulli distribution with different probability parameters, you observe that the proportion of 11s and 00s shifts according to the value of pp. For example, with p=0.2p = 0.2, you expect around 20% of the outcomes to be 11 (success) and 80% to be 00 (failure). With p=0.5p = 0.5, the outcomes are roughly balanced, and with p=0.8p = 0.8, most outcomes are 11. This directly illustrates how the Bernoulli distribution models the likelihood of a binary event, and how adjusting the probability parameter allows you to fit the distribution to the characteristics of your data. In machine learning, this property enables you to model the probability that a given input belongs to the positive class, which is essential for tasks like logistic regression.

question mark

Which of the following statements best describes the role of the Bernoulli distribution in binary classification machine learning models?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

bookThe Bernoulli Distribution

Swipe to show menu

The Bernoulli distribution is one of the simplest yet most fundamental probability distributions in machine learning. It models binary eventsβ€”situations where there are only two possible outcomes, such as success/failure, yes/no, or 1/0. In the context of machine learning, the Bernoulli distribution is crucial for binary classification tasks, where you need to predict whether an instance belongs to one of two classes. For example, determining whether an email is spam or not spam, or whether a tumor is malignant or benign, can be naturally represented by a Bernoulli random variable.

A Bernoulli random variable takes the value 11 with probability pp (the probability of "success") and 00 with probability 1βˆ’p1 - p (the probability of "failure"). The probability mass function is given by:

P(X=x)=px(1βˆ’p)1βˆ’x,Β whereΒ x∈{0,1}P(X = x) = p^x (1-p)^{1-x}, \text{ where } x \in \{0,1\}

This means that if you know the probability of success, you can describe the entire distribution. The Bernoulli distribution forms the building block for more complex models, such as the binomial and multinomial distributions, and is at the heart of models like logistic regression.

123456789101112131415161718
import numpy as np import matplotlib.pyplot as plt # Define different probabilities for the Bernoulli distribution probabilities = [0.2, 0.5, 0.8] num_samples = 1000 fig, axes = plt.subplots(1, 3, figsize=(12, 4)) for ax, p in zip(axes, probabilities): samples = np.random.binomial(n=1, p=p, size=num_samples) counts = np.bincount(samples, minlength=2) ax.bar([0, 1], counts, tick_label=['0', '1'], color=['#1f77b4', '#ff7f0e']) ax.set_title(f'p = {p}') ax.set_xlabel('Outcome') ax.set_ylabel('Count') ax.set_ylim(0, num_samples) plt.tight_layout() plt.show()
copy

When you simulate samples from a Bernoulli distribution with different probability parameters, you observe that the proportion of 11s and 00s shifts according to the value of pp. For example, with p=0.2p = 0.2, you expect around 20% of the outcomes to be 11 (success) and 80% to be 00 (failure). With p=0.5p = 0.5, the outcomes are roughly balanced, and with p=0.8p = 0.8, most outcomes are 11. This directly illustrates how the Bernoulli distribution models the likelihood of a binary event, and how adjusting the probability parameter allows you to fit the distribution to the characteristics of your data. In machine learning, this property enables you to model the probability that a given input belongs to the positive class, which is essential for tasks like logistic regression.

question mark

Which of the following statements best describes the role of the Bernoulli distribution in binary classification machine learning models?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1
some-alt