Kursinhalt

Introduction to Reinforcement Learning

1. RL Core Theory

What is RL?RL vs Other Learning Paradigms Markov Decision Process Episodes and Returns Model, Policy, and Values Exploration vs Exploitation Gymnasium Basics Challenge: Setting Up an Environment

2. Multi-Armed Bandit Problem

Problem Introduction Action Values Epsilon-Greedy Algorithm Upper Confidence Bound Algorithm Gradient Bandits Algorithm Challenge: Multi-Armed Bandits

3. Dynamic Programming

What is Dynamic Programming?Bellman Equations Optimality Conditions Policy Evaluation Policy Improvement Generalized Policy Iteration Policy Iteration Value Iteration Challenge: Dynamic Programming

4. Monte Carlo Methods

What are Monte Carlo Methods?Value Function Estimation Monte Carlo Control Exploration Approaches On-Policy Monte Carlo Control Off-Policy Monte Carlo Control Incremental Implementations Challenge: Monte Carlo Methods

5. Temporal Difference Learning

What is Temporal Difference Learning?TD(0): Value Function Estimation SARSA: On-Policy TD Learning Q-Learning: Off-Policy TD Learning Generalization of TD Learning Challenge: Temporal Difference Learning

Gymnasium Basics

Gymnasium is an open-source toolkit designed for developing and evaluating reinforcement learning (RL) agents. It provides a collection of standard environments for testing algorithms and training agents efficiently.

Key Features

Standardized API: ensures compatibility across different environments;
Variety of environments: supports classic control problems, Atari games, and robotics simulations;
Easy integration: compatible with deep learning frameworks like TensorFlow and PyTorch.

Workflow

A typical workflow in Gymnasium looks like this:

1. Import the Library


python

After the original gym library was discontinued, it is now recommended to use gymnasium — a well-maintained and actively developed fork of gym. Despite the change in name, the library is still commonly imported using the alias gym for backward compatibility and convenience.

2. Create an Environment


python

The gym.make() function instantiates an environment using its unique identifier (e.g., "CartPole-v1"). You can also pass additional configuration parameters depending on the environment's requirements.

3. Reset the Environment


python

Before interacting with the environment, you must reset it to its initial state using env.reset(). This returns:

observation: the initial state of the environment;
info: auxiliary data that may include metadata or state-specific configuration.

4. Interact with the Environment


python

In the first line, a random action is chosen from the action space using env.action_space.sample(). The action space defines the set of all possible actions the agent can take in the environment. Additionally, the environment provides the observation space, which can be accessed via env.observation_space and represents the set of all possible observations(states) the agent can encounter.

In the second line, the chosen action is passed to env.step(action), which executes the action and returns the following:

observation: the agent's new state after taking the action;
reward: the reward received for the action taken;
terminated: a boolean indicating whether the episode has ended (i.e., the task is complete);
truncated: a boolean indicating whether the episode was prematurely stopped (due to time or other constraints);
info: additional diagnostic information, often used for debugging or logging purposes.

5. Close the Environment


python

If your environment consumes external resources (e.g., rendering windows or simulations), you should close it using env.close().

Study More

If you want to know more about features provided by Gymnasium library, you should visit their website.

War alles klar?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 7

Fragen Sie AI

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen