Kursinhalt
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Value Iteration
While policy iteration is an effective approach for solving MDPs, it has a significant drawback: each iteration involves a separate policy evaluation step. When policy evaluation is performed iteratively, it requires multiple sweeps over the entire state space, leading to considerable computational overhead and longer computation times.
A good alternative is value iteration, a method that merges policy evaluation and policy improvement into a single step. This method updates the value function directly until it converges to the optimal value function. Once convergence is achieved, the optimal policy can be derived directly from this optimal value function.
How it Works?
Value iteration works by completing only one backup during policy evaluation, before doing policy improvement. This results in a following update formula:
By turning Bellman optimality equation into update rule, policy evaluation and policy improvement are merged into a single step.
Pseudocode
Danke für Ihr Feedback!