Course Content
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Iteration
The idea behind policy iteration is simple:
- Take some initial and .
- Use policy evaluation to update until it's consistent with .
- Use policy improvement to update until it's greedy with respect to .
- Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
- During policy evaluation, values are updated for each state, until they are consistent with current policy;
- During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
SectionΒ 3. ChapterΒ 7