Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Policy Iteration | Dynamic Programming
/
Introduction to Reinforcement Learning with Python

bookPolicy Iteration

メニューを表示するにはスワイプしてください

The idea behind policy iteration is simple:

  1. Take some initial π\pi and vv;
  2. Use policy evaluation to update vv until it's consistent with π\pi;
  3. Use policy improvement to update π\pi until it's greedy with respect to vv;
  4. Repeat steps 2-3 until convergence.

In this method, there are no partial updates:

  • During policy evaluation, values are updated for each state, until they are consistent with current policy;
  • During policy improvement, policy is made greedy with respect to value function.

Pseudocode

question mark

Based on the pseudocode, what condition causes the outer loop of policy iteration to stop?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 3.  7

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 3.  7
some-alt