Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Generalized Policy Iteration | Dynamic Programming
Introduction to Reinforcement Learning
course content

Kursinhalt

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

1. RL Core Theory
2. Multi-Armed Bandit Problem
3. Dynamic Programming
4. Monte Carlo Methods
5. Temporal Difference Learning

book
Generalized Policy Iteration

In previous chapters, you learned about policy evaluation and policy improvement. These processes complement each other and naturally combine into a framework known as generalized policy iteration.

Most reinforcement learning methods can be described within the framework of GPI. The key differences among these methods stem from the specific implementations of policy evaluation and policy improvement, as well as the nature of their interactions.

Interaction Between Two Processes

Policy evaluation and policy improvement can be seen as both cooperative and competitive processes, depending on the perspective:

  • Cooperative: both processes work toward a common goal—finding the optimal policy and value function. Policy evaluation estimates the value function for a given policy, while policy improvement refines the policy based on these estimates;
  • Competitive: each process has conflicting objectives. Policy evaluation aims to accurately estimate the value function for the current policy, often causing the policy to no longer be greedy. Conversely, policy improvement adjusts the policy to be greedy with respect to the current value function estimates, typically rendering those estimates incorrect. This constant push-and-pull continues until both the policy and value function converge to their optimal forms.

Summary

Generalized policy iteration is a useful framework for understanding how different reinforcement learning methods approach solving the MDPs. In the upcoming chapters, you will explore how these ideas can be applied to create two famous DP methods: policy iteration and value iteration.

question mark

Select the two processes that work together in generalized policy iteration framework

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 6
Wir sind enttäuscht, dass etwas schief gelaufen ist. Was ist passiert?
some-alt