Let and
be sets of vectors respectively generated
by VI (Figure 1) and VI1 (Figure 2) at line 3 in iteration n.
Suppose the initial set is uniformly improvable.
Using Lemma 2 and Corollary 1,
one can prove by induction that
and
are uniformly
improvable for all n and their induced value functions
increase with n.
Moreover,
dominates
and
is dominated by the optimal value function. It is well known that
converges to the optimal value function. Therefore,
must also converge to the optimal value function.
The question now is how to make sure that the initial set is uniformly improvable. The following lemma answers this question.
Proof: Use V to denote the value function induced by the singleton set. For any belief state b, we have
Therefore the value function, and hence the singleton set, is
uniformly improvable.
Experiments (Section 6) have shown that VI1
is more efficient VI on a number of
test problems. It should be noted, however, that we have not proved this
is always the case.
Moreover,
complexity results by Papadimitriou and Tsitsiklis (1987)
implies that the task of finding -optimal policies
for POMDPs is PSPACE-complete.
Hence, the worst-case complexity should remain
the same.