Contenu du cours
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Model, Policy, and Values
Model
A model represents the environment's dynamics and helps the agent predict how the environment will respond to its actions.
Reinforcement learning algorithms can be divided into two categories:
- Model-based: in this approach, the agent learns or has access to a model of the environment, which allows it to simulate future states and rewards before taking actions. This enables the agent to plan and make more informed decisions;
- Model-free: in this approach, the agent does not have a direct model of the environment. It learns solely through interaction with the environment, relying on trial and error to discover the best actions.
Policy
An agent determines its actions by evaluating the current state of its environment. To accurately model an agent's behavior, we introduce a concept known as policy.
Policy is usually denoted as .
There are two types of policies:
- Deterministic policy: the agent always selects the same action for a given state;
- Stochastic policy: the agent selects actions based on probability distributions.
During the learning process, the agent's goal is to find an optimal policy. An optimal policy is one that maximizes the expected return, guiding the agent to make the best possible decisions in any given state.
Value Functions
Value functions are crucial in understanding how an agent evaluates the potential of a particular state or state-action pair. They are used to estimate the future expected rewards, helping the agent make informed decisions.
State Value Function
State value function is usually denoted as or . It is also called a V-function.
The value of a state can be expressed mathematically like this:
State-Action Value Function
State-action value function is usually denoted as or . It is also called an action value function or Q-function.
The value of an action can be expressed mathematically like this:
Relationship Between Model, Policy, and Value Functions
The concepts of model, policy, and value functions are intricately linked, forming a comprehensive framework for categorizing RL algorithms. This framework is defined by two primary axes:
- Learning target: this axis represents the spectrum of RL algorithms based on their reliance on value functions, policy functions, or a combination of both;
- Model application: this axis distinguishes algorithms based on whether they utilize a model of the environment or learn solely through interaction.
By combining these dimensions, we can classify RL algorithms into distinct categories, each with its own set of characteristics and ideal use cases. Understanding these relationships helps in selecting the appropriate algorithm for specific tasks, ensuring efficient learning and decision-making processes.
Merci pour vos commentaires !