|
Reinforcement Learning
Publication |
Mitchell/97b: Machine Learning
|
Name |
Reinforcement Learning |
Description |
In the reinforcement learning scenario the task is to find a
policy that maximizes the reward of an agent by
experimentation.
|
Definition |
Given is an agents environment, represented by
- a discrete and usually finite set S of states. The agent is always in exactly one of the states of S.
- a set A of actions. Actions take the agent from one state to another, described by a transition function δ: S x A →
S.
- a reward function r: S x A → ℝ, describing the reward an agent gets, according to the action performed in a certain state. The reward may be negative.
The task is to find a policy π :
S → A that maximizes the agent's reward.
Policies describe the agents behaviour in dependence of
its current state.
In detail there are different measures, one could
try to maximize:
-
If the lifetime of an agent is known and limited,
it makes sense to try to maximize the sum over
all rewards.
-
If the lifetime is unlimited, the average
reward can be regarded instead.
-
Another common measure is the discounted
cumulative reward, which evaluates early
rewards higher than later rewards.
|
Methods |
Q Learning
|
Theories |
Learning as Optimization
|
|
|