next up previous
Next: Computing the Bellman Residual Up: Point-Based DP Update: The Previous: Stopping Point-Based Value Iteration

Convergence of Modified Value Iteration

Let tex2html_wrap_inline1652 and tex2html_wrap_inline1654 be sets of vectors respectively generated by VI (Figure 1) and VI1 (Figure 2) at line 3 in iteration n. Suppose the initial set is uniformly improvable. Using Lemma 2 and Corollary 1, one can prove by induction that tex2html_wrap_inline1652 and tex2html_wrap_inline1654 are uniformly improvable for all n and their induced value functions increase with n. Moreover, tex2html_wrap_inline1654 dominates tex2html_wrap_inline1652 and is dominated by the optimal value function. It is well known that tex2html_wrap_inline1652 converges to the optimal value function. Therefore, tex2html_wrap_inline1654 must also converge to the optimal value function.

The question now is how to make sure that the initial set is uniformly improvable. The following lemma answers this question.

  lem375

Proof: Use V to denote the value function induced by the singleton set. For any belief state b, we have

eqnarray381

Therefore the value function, and hence the singleton set, is uniformly improvable. tex2html_wrap_inline1424

Experiments (Section 6) have shown that VI1 is more efficient VI on a number of test problems. It should be noted, however, that we have not proved this is always the case. Moreover, complexity results by Papadimitriou and Tsitsiklis (1987) implies that the task of finding tex2html_wrap_inline1070 -optimal policies for POMDPs is PSPACE-complete. Hence, the worst-case complexity should remain the same.



Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001