Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre What are Monte Carlo Methods? | Monte Carlo Methods
Introduction to Reinforcement Learning
course content

Contenu du cours

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

1. RL Core Theory
2. Multi-Armed Bandit Problem
3. Dynamic Programming
4. Monte Carlo Methods
5. Temporal Difference Learning

book
What are Monte Carlo Methods?

Monte Carlo (MC) methods are a class of computational algorithms that rely on random sampling to estimate numerical results. These methods are widely used in fields such as physics, finance, engineering, and machine learning.

Monte Carlo methods are used when deterministic solutions are difficult or impossible to obtain. They replace exact computations with approximations that improve with the number of random samples.

How they Work?

Monte Carlo methods can vary from one task to another, but all of them tend to follow a single pattern:

  1. Define a domain of possible inputs;
  2. Generate random inputs from a probability distribution;
  3. Evaluate a function on these inputs;
  4. Aggregate the results to produce an estimate.

Examples

While the pattern described above may sound complex, these examples should help clarify the idea behind it.

Integral Computation

Computing integrals is a non-trivial task that usually requires applying many techniques to achieve the correct result.

Let's try to apply Monte Carlo method to solve this integral:

010111+(x+y)2dxdy\int_0^1 \int_0^1 \frac{1}{1 + (x + y)^2} \, dx \, dy
  1. Input domain: this double integral has two variables, x[0,1]x \in [0, 1] and y[0,1]y \in [0, 1];
  2. Generation: both of these variables are independent from each other and uniformly distributed;
  3. Evaluation: to get a point value, function under integrals can be used;
  4. Aggregation: the value of this integral can be defined as a volume under the curve. Volume can be computed as a product of base area and average height. Base area is 1(unit square) and average height is the average of results received in previous step.

Now, look at the implementation of this process in code:

123456789101112131415
import numpy as np result = 0 # Many samples are required for estimates to be precise for i in range(100000): # Generation of random variables x, y = np.random.uniform(), np.random.uniform() # Computation of point value value = 1 / (1 + (x + y) ** 2) # Mean aggregation result += (value - result) / (i + 1) true_result = 2*np.arctan(2) - np.pi/2 - (1/2)*np.log(5) + np.log(2) print(f"Approximated result: {result}") print(f"True result: {true_result}")
copy

Approximation of π\Large\pi

Approximating π\pi is one of the most iconic uses of the Monte Carlo method. It illustrates how random sampling can solve a geometric problem without any complex calculus.

Consider a unit square with a quarter circle inscribed in it:

  • The square spans [0,1]×[0,1][0, 1] \times [0, 1];
  • The quarter circle has radius 1 and is centered at the origin.

The area of the quarter circle is πr24\displaystyle\frac{\pi r^2}{4} or π4\displaystyle\frac{\pi}{4}, and the area of the square is 1. Now let's sample random points inside of a square. With big enough sample size:

Points inside the quarter circleTotal pointsπ4\frac{\text{Points inside the quarter circle}}{\text{Total points}} \approx \frac\pi4

So the value of π\pi can be computed as

π4Points insideTotal points\pi \approx 4 \cdot \frac{\text{Points inside}}{\text{Total points}}

Now, look at the code:

1234567891011121314151617181920212223242526272829
import numpy as np import matplotlib.pyplot as plt # Lists for coordinates inside = [] outside = [] # Many samples are required for estimates to be precise for _ in range(100000): # Generation of random variables x, y = np.random.uniform(), np.random.uniform() # Splitting points inside and outside of the circle if x**2 + y**2 <= 1: inside.append((x, y)) else: outside.append((x, y)) # Plotting points plt.figure(figsize=(6,6)) plt.scatter(*zip(*inside), color='blue', s=1, label='Inside') plt.scatter(*zip(*outside), color='red', s=1, label='Outside') plt.legend() plt.xlabel("x") plt.ylabel("y") plt.show() estimate = 4 * len(inside) / (len(inside) + len(outside)) print(f"Estimated value of pi: {estimate}") print(f"True value of pi: {np.pi}")
copy

Multi-Armed Bandits

In the multi-armed bandit setting, a key objective is to estimate the action value for each arm — that is, the expected reward of choosing a particular action. One common strategy is to estimate these values by averaging the observed rewards obtained from pulling each arm over time. This technique is, in fact, a Monte Carlo method.

Monte Carlo Methods for MDPs

Unlike dynamic programming methods, which rely on a complete and accurate model of the environment’s dynamics, Monte Carlo methods learn solely from experience — that is, from actual or simulated sequences of states, actions, and rewards.

This makes Monte Carlo approaches especially powerful: they don’t require any prior knowledge about how the environment works. Instead, they extract value estimates directly from what happens during interaction. In many real-world scenarios, where modeling the environment is impractical or impossible, this ability to learn from raw experience is a major advantage.

When direct interaction with the environment is costly, risky, or slow, Monte Carlo methods can also learn from simulated experience, provided a reliable simulation exists. This allows for exploration and learning in a controlled, repeatable setting — though it does assume access to a model capable of generating plausible transitions.

question mark

What is a primary advantage of using Monte Carlo methods over dynamic programming methods in solving MDPs?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 4. Chapitre 1
Nous sommes désolés de vous informer que quelque chose s'est mal passé. Qu'est-il arrivé ?
some-alt