Kursinnhold
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Model, Policy, and Values
Model
A model is a representation of the environment that defines the transition probabilities between states and the expected rewards for actions taken.
Reinforcement learning algorithms can be divided into two categories:
Model-based: in this approach, the agent learns or has access to a model of the environment, which allows it to simulate future states and rewards before taking actions. This enables the agent to plan and make more informed decisions;
Model-free: in this approach, the agent does not have a direct model of the environment. It learns solely through interaction with the environment, relying on trial and error to discover the best actions.
In practice, environments with explicit models are uncommon, making it difficult for agents to rely on model-based strategies. As a result, model-free approaches have become more prevalent and extensively studied in reinforcement learning research and applications.
Policy
Policy is the strategy an agent follows to decide its actions based on the current state of the environment.
There are two types of policies:
Deterministic policy: the agent always selects the same action for a given state;
Stochastic policy: the agent selects actions based on probability distributions.
During the learning process, the agent's goal is to find an optimal policy. An optimal policy is one that maximizes the expected return, guiding the agent to make the best possible decisions in any given state.
Value Functions
Value functions are crucial in understanding how an agent evaluates the potential of a particular state or state-action pair. They are used to estimate the future expected rewards, helping the agent make informed decisions.
State Value Function
State value function (or ) is a function that provides the expected return of being in a particular state and following a specific policy. It helps in evaluating the desirability of states.
The value of a state can be expressed mathematically like this:
State-Action Value Function
State-action value function (or ) is a function that provides the expected return of taking a particular action in a given state and following a specific policy thereafter. It helps in evaluating the desirability of actions in states.
State-action value function is often called action value function.
The value of an action can be expressed mathematically like this:
Relationship Between Model, Policy, and Value Functions
The concepts of model, policy, and value functions are intricately linked, forming a comprehensive framework for categorizing RL algorithms. This framework is defined by two primary axes:
Learning target: this axis represents the spectrum of RL algorithms based on their reliance on value functions, policy functions, or a combination of both;
Model application: this axis distinguishes algorithms based on whether they utilize a model of the environment or learn solely through interaction.
By combining these dimensions, we can classify RL algorithms into distinct categories, each with its own set of characteristics and ideal use cases. Understanding these relationships helps in selecting the appropriate algorithm for specific tasks, ensuring efficient learning and decision-making processes.
Takk for tilbakemeldingene dine!