Course Content
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Markov Decision Process
Reinforcement learning problems are often framed as MDPs, which provide a structured way to define the problem. MDPs describe the environment using four key components: states, actions, transitions, and rewards. These components work together under the Markov property, which ensures that the future state depends only on the current state and action, not on past states.
The Four Components
State
State is usually denoted as , and state space as .
A state is typically represented by a set of parameters that capture the relevant features of the environment. These parameters can include various aspects such as the position, velocity, rotation, etc.
Action
An action is usually denoted as , and action space as .
The set of possible actions usually depends on the current state.
Transition
Transition function is usually denoted as .
In many cases, environments can be either deterministic or stochastic, meaning that the transition may be predictable or may involve some degree of randomness.
Reward
A reward is usually denoted as and reward function as .
Rewards steer the agent toward desirable behavior, and can be either positive or negative. Reward engineering is complex, as the agent may attempt to exploit the rewards.
Markov Property
The Markov property in a Markov decision process states that the next state and reward depend only on the current state and action, not on past information. This ensures a memoryless framework, simplifying the learning process.
Mathematically, this property can be described by this formula:
Thanks for your feedback!