Course Content
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Iteration
The idea behind policy iteration is simple:
Take some initial and ;
Use policy evaluation to update until it's consistent with ;
Use policy improvement to update until it's greedy with respect to ;
Repeat steps 2-3 until convergence.
In this method, there are no partial updates:
During policy evaluation, values are updated for each state, until they are consistent with current policy;
During policy improvement, policy is made greedy with respect to value function.
Pseudocode
Everything was clear?
Thanks for your feedback!
SectionΒ 3. ChapterΒ 7