Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Exploration vs Exploitation | RL Core Theory
Introduction to Reinforcement Learning
course content

Conteúdo do Curso

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

1. RL Core Theory
2. Multi-Armed Bandit Problem
3. Dynamic Programming
4. Monte Carlo Methods
5. Temporal Difference Learning

book
Exploration vs Exploitation

The exploration vs exploitation problem is a fundamental dilemma in reinforcement learning. It arises when an agent must choose between two competing strategies:

  1. Exploration: trying new options to gather more information, even if the immediate reward is uncertain;
  2. Exploitation: choosing the best-known option based on past experiences to maximize immediate rewards.

The Trade-Off

This problem occurs in scenarios where decisions influence future outcomes. If an agent only exploits what it knows, it may miss out on better opportunities. On the other hand, excessive exploration can lead to unnecessary risks or wasted resources without guaranteeing better results.

Real-World Examples

  • Online recommendations: a streaming service can either recommend a popular movie (exploitation) or suggest a less-known film to learn about a user's preferences (exploration);
  • Product development: a company may focus on improving a popular product that has been consistently successful in the market (exploitation) or invest in developing entirely new products or features (exploration);
  • Investment strategies: a stock trader must decide whether to invest in well-performing stocks (exploitation) or experiment with new investments that might yield higher returns (exploration).

The Challenge

The difficulty lies in balancing these two strategies effectively. Too much exploitation can lead to suboptimal long-term gains, while excessive exploration can be inefficient and costly. The key is to find an optimal balance that maximizes long-term benefits while minimizing risks.

question mark

You are training a reinforcement learning agent to navigate through a maze. After a very long time, it learned to reliably exit the maze, but the path it takes is far from optimal. What would you do?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 6
Sentimos muito que algo saiu errado. O que aconteceu?
some-alt