A value function V is represented by a set of vectors if it equals the value function induced by the set. When a value function is representable by a finite set of vectors, there is a unique parsimonious set of vectors that represents the function (Littman et al. 1995a).
Sondik (1971) has shown that
if a value function V is representable by a finite
set of vectors, then so is the value function
TV.
The process of obtaining the parsimonious representation
for TV from the parsimonious representation of V is usually
referred to as
dynamic programming (DP) update.
Let be the parsimonious set of vectors that represents V.
For convenience, we use
to denote the
parsimonious set of vectors that represents TV.
In practice, value iteration for POMDPs is
not carried out directly in terms of value functions
themselves.
Rather, it is carried out in terms
of sets of vectors that represent the value functions
(Figure 2).
One begins with an initial set of vectors .
At each iteration, one performs a DP update
on the previous parsimonious set
of vectors
and obtains a new parsimonious set of vectors
.
One continues until the Bellman residual
, which is determined by solving a
sequence of linear programs,
falls below a threshold.
Figure 2: Value Iteration for POMDPs.