Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Episodes and Returns | RL Core Theory
Introduction to Reinforcement Learning
course content

Kursinhalt

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning

1. RL Core Theory
2. Multi-Armed Bandit Problem
3. Dynamic Programming
4. Monte Carlo Methods
5. Temporal Difference Learning

book
Episodes and Returns

The Length of a Task

RL tasks are typically categorized as episodic or continuous, depending on how the learning process is structured over time.

Episodic tasks are those that consist of a finite sequence of states, actions, and rewards, where the agent's interaction with the environment is divided into distinct episodes.

In contrast, continuous tasks do not have a clear end to each interaction cycle. The agent continually interacts with the environment without resetting to an initial state, and the learning process is ongoing, often without a distinct terminal point.

Return

You already know that the agent's main goal is to maximize cumulative rewards. While the reward function provides instantaneous rewards, it doesn't account for future outcomes, which can be problematic. An agent trained solely to maximize immediate rewards may overlook long-term benefits. To address this issue, let's introduce a concept of return.

Return is usually denoted as GG.

The return is a better representation of how good a particular state or action is in the long run. The goal of reinforcement learning can now be defined as maximizing the return.

If TT is the final time step, the formula of a return looks like this:

Gt=Rt+1+Rt+2+Rt+3+...+RTG_t = R_{t+1} + R_{t+2} + R_{t+3} + ... + R_T

Discounting

While simple return serves as a good target in episodic tasks, in continuous tasks a problem arises. If the number of time steps is infinite, the return itself can be infinite. To handle this, a discount factor is used to ensure that future rewards are given less weight, preventing the return from becoming infinite.

Discount factor is usually denoted as γ\gamma.

Return combined with a discount factor is called discounted return.

The formula for discounted return looks like this:

Gt=Rt+1+γRt+2+γ2Rt+3+...=k=0γkRt+k+1G_t = R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + ... = \sum_{k=0}^\infty \gamma^k R_{t+k+1}
question mark

What does the discount factor γ\gamma represent?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 4
Wir sind enttäuscht, dass etwas schief gelaufen ist. Was ist passiert?
some-alt