Impara Random Variables and Probability Distributions | Fundamentals of Probability in ML

Scorri per mostrare il menu

When you build machine learning models, you are always working under uncertainty. This uncertainty comes from the fact that you do not know the exact outcomes in advance — whether you are predicting tomorrow's weather, estimating the price of a house, or classifying an image. To capture and reason about this uncertainty, you use random variables and probability distributions. A random variable is a mathematical object that takes on different values according to some chance process. It allows you to describe outcomes numerically, whether those outcomes are the result of flipping a coin, rolling a die, or measuring the temperature.

Probability distributions are essential because they describe how likely each possible value of a random variable is. In machine learning, you rely on these distributions to model the noise in your data, to make predictions, and to quantify the confidence in your results. Whether you are dealing with discrete outcomes (like class labels) or continuous outcomes (like regression targets), probability distributions provide the mathematical framework for making sense of randomness and uncertainty in your models.

Discrete Random Variables

A discrete random variable takes on a countable set of possible values. For instance, the number of spam emails you receive in a day or the class label assigned by a classifier (such as cat, dog, or bird) are both discrete. In machine learning, classification tasks typically use discrete random variables to represent categories.

Continuous Random Variables

A continuous random variable can take on any value within a range or interval. For example, the height of a person or the predicted price of a house can be modeled as continuous variables. In regression tasks, your model predicts continuous outcomes, so you use continuous random variables to represent possible results.


              1234567891011121314151617181920212223242526
            
import numpy as np
import matplotlib.pyplot as plt

# Discrete random variable: Simulate 1000 coin flips (0=heads, 1=tails)
discrete_samples = np.random.binomial(n=1, p=0.5, size=1000)

# Continuous random variable: Simulate 1000 samples from a normal distribution
continuous_samples = np.random.normal(loc=0, scale=1, size=1000)

fig, axs = plt.subplots(1, 2, figsize=(12, 4))

# Plot discrete distribution (histogram)
axs[0].hist(discrete_samples, bins=[-0.5, 0.5, 1.5], edgecolor='black', rwidth=0.8)
axs[0].set_xticks([0, 1])
axs[0].set_xlabel('Value')
axs[0].set_ylabel('Frequency')
axs[0].set_title('Discrete: Coin Flip (Binomial)')

# Plot continuous distribution (histogram)
axs[1].hist(continuous_samples, bins=30, color='orange', edgecolor='black', alpha=0.7)
axs[1].set_xlabel('Value')
axs[1].set_ylabel('Frequency')
axs[1].set_title('Continuous: Normal Distribution')

plt.tight_layout()
plt.show()

The plots above show how randomness appears in both discrete and continuous settings. The left plot, representing the coin flip, has only two possible outcomes — each flip is unpredictable, but over many trials, you see a pattern emerge. The right plot, representing samples from a normal distribution, shows a smooth curve where most values cluster around the center, but any individual outcome is still uncertain. In machine learning, you use these distributions to model the possible predictions your model can make, and to understand the variability and confidence in those predictions.

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 2

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 1. Capitolo 2