Summary  
This chapter introduces the Bernoulli distribution for modeling binary events, describes its probability mass function, and demonstrates sampling 0/1 outcomes using code to observe how varying the success probability p shifts outcome frequencies.

General domain of usage  
Binary classification in machine learning

The **Bernoulli distribution** is one of the simplest yet most fundamental probability distributions in machine learning. It models **binary events**—situations where there are only two possible outcomes, such as **success/failure**, **yes/no**, or **1/0**. In the context of machine learning, the Bernoulli distribution is crucial for **binary classification** tasks, where you need to predict whether an instance belongs to one of two classes. For example, determining whether an email is spam or not spam, or whether a tumor is malignant or benign, can be naturally represented by a Bernoulli random variable.

A **Bernoulli random variable** takes the value $$1$$ with probability $$p$$ (the probability of "success") and $$0$$ with probability $$1 - p$$ (the probability of "failure"). The probability mass function is given by:

$$
P(X = x) = p^x (1-p)^{1-x}, \text{ where } x \in \{0,1\}
$$

This means that if you know the probability of success, you can describe the entire distribution. The Bernoulli distribution forms the building block for more complex models, such as the **binomial** and **multinomial distributions**, and is at the heart of models like **logistic regression**.

import numpy as np
import matplotlib.pyplot as plt

# Define different probabilities for the Bernoulli distribution
probabilities = [0.2, 0.5, 0.8]
num_samples = 1000

fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax, p in zip(axes, probabilities):
    samples = np.random.binomial(n=1, p=p, size=num_samples)
    counts = np.bincount(samples, minlength=2)
    ax.bar([0, 1], counts, tick_label=['0', '1'], color=['#1f77b4', '#ff7f0e'])
    ax.set_title(f'p = {p}')
    ax.set_xlabel('Outcome')
    ax.set_ylabel('Count')
    ax.set_ylim(0, num_samples)
plt.tight_layout()
plt.show()

When you simulate samples from a **Bernoulli distribution** with different probability parameters, you observe that the proportion of $$1$$s and $$0$$s shifts according to the value of $$p$$. For example, with $$p = 0.2$$, you expect around 20% of the outcomes to be $$1$$ (**success**) and 80% to be $$0$$ (**failure**). With $$p = 0.5$$, the outcomes are roughly balanced, and with $$p = 0.8$$, most outcomes are $$1$$. This directly illustrates how the **Bernoulli distribution** models the likelihood of a binary event, and how adjusting the probability parameter allows you to fit the distribution to the characteristics of your data. In machine learning, this property enables you to model the probability that a given input belongs to the positive class, which is essential for tasks like **logistic regression**.

Which of the following statements best describes the role of the Bernoulli distribution in binary classification machine learning models?

Build strong intuition for probability distributions central to machine learning. Explore the exponential family, Gaussian, Bernoulli, and Multinomial distributions, and understand their roles in modeling, loss functions, and inference, with minimal code and clear visualizations.

Establish the core probabilistic concepts and their relevance to machine learning models.

Dive into the Gaussian distribution, its properties, and its foundational role in machine learning.

Explore the Bernoulli and Multinomial distributions, their intuition, and their use in classification and modeling.

Connect probability distributions to the design and interpretation of loss functions in machine learning.

The Bernoulli Distribution