Reinforcement Learning

Publication Mitchell/97b: Machine Learning

Name Reinforcement Learning

Description
In the reinforcement learning scenario the task is to find a policy that maximizes the reward of an agent by experimentation.

Definition
Given is an agents environment, represented by

a discrete and usually finite set S of states. The agent is always in exactly one of the states of S.

a set A of actions. Actions take the agent from one state to another, described by a transition function δ: S x A → S.

a reward function r: S x A → ℝ, describing the reward an agent gets, according to the action performed in a certain state. The reward may be negative.

The task is to find a policy π : S → A that maximizes the agent's reward. Policies describe the agents behaviour in dependence of its current state.

In detail there are different measures, one could try to maximize:

If the lifetime of an agent is known and limited, it makes sense to try to maximize the sum over all rewards.

If the lifetime is unlimited, the average reward can be regarded instead.

Another common measure is the discounted cumulative reward, which evaluates early rewards higher than later rewards.

Methods Q Learning

Theories Learning as Optimization