Contenido del Curso
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
What are Monte Carlo Methods?
Monte Carlo (MC) methods are a class of computational algorithms that rely on random sampling to estimate numerical results. These methods are widely used in fields such as physics, finance, engineering, and machine learning.
Monte Carlo methods are used when deterministic solutions are difficult or impossible to obtain. They replace exact computations with approximations that improve with the number of random samples.
How they Work?
Monte Carlo methods can vary from one task to another, but all of them tend to follow a single pattern:
- Define a domain of possible inputs;
- Generate random inputs from a probability distribution;
- Evaluate a function on these inputs;
- Aggregate the results to produce an estimate.
Examples
While the pattern described above may sound complex, these examples should help clarify the idea behind it.
Integral Computation
Computing integrals is a non-trivial task that usually requires applying many techniques to achieve the correct result.
Let's try to apply Monte Carlo method to solve this integral:
- Input domain: this double integral has two variables, and ;
- Generation: both of these variables are independent from each other and uniformly distributed;
- Evaluation: to get a point value, function under integrals can be used;
- Aggregation: the value of this integral can be defined as a volume under the curve. Volume can be computed as a product of base area and average height. Base area is 1(unit square) and average height is the average of results received in previous step.
Now, look at the implementation of this process in code:
import numpy as np result = 0 # Many samples are required for estimates to be precise for i in range(100000): # Generation of random variables x, y = np.random.uniform(), np.random.uniform() # Computation of point value value = 1 / (1 + (x + y) ** 2) # Mean aggregation result += (value - result) / (i + 1) true_result = 2*np.arctan(2) - np.pi/2 - (1/2)*np.log(5) + np.log(2) print(f"Approximated result: {result}") print(f"True result: {true_result}")
Approximation of
Approximating is one of the most iconic uses of the Monte Carlo method. It illustrates how random sampling can solve a geometric problem without any complex calculus.
Consider a unit square with a quarter circle inscribed in it:
- The square spans ;
- The quarter circle has radius 1 and is centered at the origin.
The area of the quarter circle is or , and the area of the square is 1. Now let's sample random points inside of a square. With big enough sample size:
So the value of can be computed as
Now, look at the code:
import numpy as np import matplotlib.pyplot as plt # Lists for coordinates inside = [] outside = [] # Many samples are required for estimates to be precise for _ in range(100000): # Generation of random variables x, y = np.random.uniform(), np.random.uniform() # Splitting points inside and outside of the circle if x**2 + y**2 <= 1: inside.append((x, y)) else: outside.append((x, y)) # Plotting points plt.figure(figsize=(6,6)) plt.scatter(*zip(*inside), color='blue', s=1, label='Inside') plt.scatter(*zip(*outside), color='red', s=1, label='Outside') plt.legend() plt.xlabel("x") plt.ylabel("y") plt.show() estimate = 4 * len(inside) / (len(inside) + len(outside)) print(f"Estimated value of pi: {estimate}") print(f"True value of pi: {np.pi}")
Multi-Armed Bandits
In the multi-armed bandit setting, a key objective is to estimate the action value for each arm — that is, the expected reward of choosing a particular action. One common strategy is to estimate these values by averaging the observed rewards obtained from pulling each arm over time. This technique is, in fact, a Monte Carlo method.
Monte Carlo Methods for MDPs
Unlike dynamic programming methods, which rely on a complete and accurate model of the environment’s dynamics, Monte Carlo methods learn solely from experience — that is, from actual or simulated sequences of states, actions, and rewards.
This makes Monte Carlo approaches especially powerful: they don’t require any prior knowledge about how the environment works. Instead, they extract value estimates directly from what happens during interaction. In many real-world scenarios, where modeling the environment is impractical or impossible, this ability to learn from raw experience is a major advantage.
When direct interaction with the environment is costly, risky, or slow, Monte Carlo methods can also learn from simulated experience, provided a reliable simulation exists. This allows for exploration and learning in a controlled, repeatable setting — though it does assume access to a model capable of generating plausible transitions.
¡Gracias por tus comentarios!