Suppose V and U are two value functions. We say
that U dominates V and write
if
for every belief state b.
A value function V is said to be uniformly improvable
if
.
A set
of vectors dominates another
set
of vectors if the value function induced
by
dominates that induced by
. A set of vectors is unformly improvable
if the value function it induces is.
This lemma is obvious and is well known in the MDP community
(Puterman 1990). Nonetheless, it enables us to
explain the intuition behind the term ``uniformly improvable".
Suppose V is a uniformly improvable value function and suppose
value iteration starts with V. Then the sequence of
value functions generated is monotonically increasing and converges
to the optimal value function .
This implies
.
That is, TV(b) is closer to
than V(b) for all
belief states b.
The following lemma will be used later to address the issues listed at the end of the previous section.
Proof: Since , we have
by Lemma 1.
We also have the condition
.
Consequently,
. That is, U is
uniformly improvable.