Reinforcement Learning

Publication Mitchell/97b: Machine Learning
Name Reinforcement Learning
Description

In the reinforcement learning scenario the task is to find a policy that maximizes the reward of an agent by experimentation.

Definition

Given is an agents environment, represented by

  • a discrete and usually finite set S of states. The agent is always in exactly one of the states of S.
  • a set A of actions. Actions take the agent from one state to another, described by a transition function δ: S x A → S.
  • a reward function r: S x A → ℝ, describing the reward an agent gets, according to the action performed in a certain state. The reward may be negative.
The task is to find a policy π : S → A that maximizes the agent's reward. Policies describe the agents behaviour in dependence of its current state.

In detail there are different measures, one could try to maximize:

  • If the lifetime of an agent is known and limited, it makes sense to try to maximize the sum over all rewards.
  • If the lifetime is unlimited, the average reward can be regarded instead.
  • Another common measure is the discounted cumulative reward, which evaluates early rewards higher than later rewards.

Methods Q Learning
Theories Learning as Optimization