Reinforcement Learning
Publication |
Mitchell/97b: Machine Learning
Name |
Reinforcement Learning |
Description |
In the reinforcement learning scenario the task is to find a
policy that maximizes the reward of an agent by
Definition |
Given is an agents environment, represented by
- a discrete and usually finite set S of states. The agent is always in exactly one of the states of S.
- a set A of actions. Actions take the agent from one state to another, described by a transition function δ: S x A →
- a reward function r: S x A → ℝ, describing the reward an agent gets, according to the action performed in a certain state. The reward may be negative.
The task is to find a policy π :
S → A that maximizes the agent's reward.
Policies describe the agents behaviour in dependence of
its current state.
In detail there are different measures, one could
try to maximize:
If the lifetime of an agent is known and limited,
it makes sense to try to maximize the sum over
all rewards.
If the lifetime is unlimited, the average
reward can be regarded instead.
Another common measure is the discounted
cumulative reward, which evaluates early
rewards higher than later rewards.
Methods |
Q Learning
Theories |
Learning as Optimization