Kursinhalt
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Generalized Policy Iteration
In previous chapters, you learned about policy evaluation and policy improvement. These processes complement each other and naturally combine into a framework known as generalized policy iteration.
Most reinforcement learning methods can be described within the framework of GPI. The key differences among these methods stem from the specific implementations of policy evaluation and policy improvement, as well as the nature of their interactions.
Interaction Between Two Processes
Policy evaluation and policy improvement can be seen as both cooperative and competitive processes, depending on the perspective:
- Cooperative: both processes work toward a common goal—finding the optimal policy and value function. Policy evaluation estimates the value function for a given policy, while policy improvement refines the policy based on these estimates;
- Competitive: each process has conflicting objectives. Policy evaluation aims to accurately estimate the value function for the current policy, often causing the policy to no longer be greedy. Conversely, policy improvement adjusts the policy to be greedy with respect to the current value function estimates, typically rendering those estimates incorrect. This constant push-and-pull continues until both the policy and value function converge to their optimal forms.
Summary
Generalized policy iteration is a useful framework for understanding how different reinforcement learning methods approach solving the MDPs. In the upcoming chapters, you will explore how these ideas can be applied to create two famous DP methods: policy iteration and value iteration.
Danke für Ihr Feedback!