Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Random Variables and Probability Distributions | Fundamentals of Probability in ML
Probability Distributions for Machine Learning

bookRandom Variables and Probability Distributions

When you build machine learning models, you are always working under uncertainty. This uncertainty comes from the fact that you do not know the exact outcomes in advance — whether you are predicting tomorrow's weather, estimating the price of a house, or classifying an image. To capture and reason about this uncertainty, you use random variables and probability distributions. A random variable is a mathematical object that takes on different values according to some chance process. It allows you to describe outcomes numerically, whether those outcomes are the result of flipping a coin, rolling a die, or measuring the temperature.

Probability distributions are essential because they describe how likely each possible value of a random variable is. In machine learning, you rely on these distributions to model the noise in your data, to make predictions, and to quantify the confidence in your results. Whether you are dealing with discrete outcomes (like class labels) or continuous outcomes (like regression targets), probability distributions provide the mathematical framework for making sense of randomness and uncertainty in your models.

Discrete Random Variables
expand arrow

A discrete random variable takes on a countable set of possible values. For instance, the number of spam emails you receive in a day or the class label assigned by a classifier (such as cat, dog, or bird) are both discrete. In machine learning, classification tasks typically use discrete random variables to represent categories.

Continuous Random Variables
expand arrow

A continuous random variable can take on any value within a range or interval. For example, the height of a person or the predicted price of a house can be modeled as continuous variables. In regression tasks, your model predicts continuous outcomes, so you use continuous random variables to represent possible results.

1234567891011121314151617181920212223242526
import numpy as np import matplotlib.pyplot as plt # Discrete random variable: Simulate 1000 coin flips (0=heads, 1=tails) discrete_samples = np.random.binomial(n=1, p=0.5, size=1000) # Continuous random variable: Simulate 1000 samples from a normal distribution continuous_samples = np.random.normal(loc=0, scale=1, size=1000) fig, axs = plt.subplots(1, 2, figsize=(12, 4)) # Plot discrete distribution (histogram) axs[0].hist(discrete_samples, bins=[-0.5, 0.5, 1.5], edgecolor='black', rwidth=0.8) axs[0].set_xticks([0, 1]) axs[0].set_xlabel('Value') axs[0].set_ylabel('Frequency') axs[0].set_title('Discrete: Coin Flip (Binomial)') # Plot continuous distribution (histogram) axs[1].hist(continuous_samples, bins=30, color='orange', edgecolor='black', alpha=0.7) axs[1].set_xlabel('Value') axs[1].set_ylabel('Frequency') axs[1].set_title('Continuous: Normal Distribution') plt.tight_layout() plt.show()
copy

The plots above show how randomness appears in both discrete and continuous settings. The left plot, representing the coin flip, has only two possible outcomes — each flip is unpredictable, but over many trials, you see a pattern emerge. The right plot, representing samples from a normal distribution, shows a smooth curve where most values cluster around the center, but any individual outcome is still uncertain. In machine learning, you use these distributions to model the possible predictions your model can make, and to understand the variability and confidence in those predictions.

question mark

Which statement best describes the difference between a random variable and its probability distribution?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

bookRandom Variables and Probability Distributions

Swipe um das Menü anzuzeigen

When you build machine learning models, you are always working under uncertainty. This uncertainty comes from the fact that you do not know the exact outcomes in advance — whether you are predicting tomorrow's weather, estimating the price of a house, or classifying an image. To capture and reason about this uncertainty, you use random variables and probability distributions. A random variable is a mathematical object that takes on different values according to some chance process. It allows you to describe outcomes numerically, whether those outcomes are the result of flipping a coin, rolling a die, or measuring the temperature.

Probability distributions are essential because they describe how likely each possible value of a random variable is. In machine learning, you rely on these distributions to model the noise in your data, to make predictions, and to quantify the confidence in your results. Whether you are dealing with discrete outcomes (like class labels) or continuous outcomes (like regression targets), probability distributions provide the mathematical framework for making sense of randomness and uncertainty in your models.

Discrete Random Variables
expand arrow

A discrete random variable takes on a countable set of possible values. For instance, the number of spam emails you receive in a day or the class label assigned by a classifier (such as cat, dog, or bird) are both discrete. In machine learning, classification tasks typically use discrete random variables to represent categories.

Continuous Random Variables
expand arrow

A continuous random variable can take on any value within a range or interval. For example, the height of a person or the predicted price of a house can be modeled as continuous variables. In regression tasks, your model predicts continuous outcomes, so you use continuous random variables to represent possible results.

1234567891011121314151617181920212223242526
import numpy as np import matplotlib.pyplot as plt # Discrete random variable: Simulate 1000 coin flips (0=heads, 1=tails) discrete_samples = np.random.binomial(n=1, p=0.5, size=1000) # Continuous random variable: Simulate 1000 samples from a normal distribution continuous_samples = np.random.normal(loc=0, scale=1, size=1000) fig, axs = plt.subplots(1, 2, figsize=(12, 4)) # Plot discrete distribution (histogram) axs[0].hist(discrete_samples, bins=[-0.5, 0.5, 1.5], edgecolor='black', rwidth=0.8) axs[0].set_xticks([0, 1]) axs[0].set_xlabel('Value') axs[0].set_ylabel('Frequency') axs[0].set_title('Discrete: Coin Flip (Binomial)') # Plot continuous distribution (histogram) axs[1].hist(continuous_samples, bins=30, color='orange', edgecolor='black', alpha=0.7) axs[1].set_xlabel('Value') axs[1].set_ylabel('Frequency') axs[1].set_title('Continuous: Normal Distribution') plt.tight_layout() plt.show()
copy

The plots above show how randomness appears in both discrete and continuous settings. The left plot, representing the coin flip, has only two possible outcomes — each flip is unpredictable, but over many trials, you see a pattern emerge. The right plot, representing samples from a normal distribution, shows a smooth curve where most values cluster around the center, but any individual outcome is still uncertain. In machine learning, you use these distributions to model the possible predictions your model can make, and to understand the variability and confidence in those predictions.

question mark

Which statement best describes the difference between a random variable and its probability distribution?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2
some-alt