Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Policy Iteration | Dynamic Programming
Introduction to Reinforcement Learning
course content

Conteúdo do Curso

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

1. RL Core Theory
2. Multi-Armed Bandit Problem
3. Dynamic Programming
4. Monte Carlo Methods
5. Temporal Difference Learning

book
Policy Iteration

The idea behind policy iteration is simple:

  1. Take some initial π\pi and vv.
  2. Use policy evaluation to update vv until it's consistent with π\pi.
  3. Use policy improvement to update π\pi until it's greedy with respect to vv.
  4. Repeat steps 2-3 until convergence.

In this method, there are no partial updates:

  • During policy evaluation, values are updated for each state, until they are consistent with current policy;
  • During policy improvement, policy is made greedy with respect to value function.

Pseudocode

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 7
Sentimos muito que algo saiu errado. O que aconteceu?
some-alt