Value iteration is a popular algorithm for finding
-optimal policies for POMDPs.
It typically performs a large
number of DP updates before convergence
and DP updates are notoriously expensive. In this paper,
we have developed a technique called point-based DP update
for reducing the number of standard DP updates.
The technique is conceptually simple and clean.
It can easily be incorporated into most
existing POMDP value iteration algorithms.
Empirical studies have shown that point-based DP update can
drastically cut down the
number of standard DP updates and hence significantly
speeding up value iteration.
Moreover, point-based DP update compares favorably with
its more complex variations that we can think also.
It also compares favorably with policy iteration.
The algorithm presented this paper still requires standard DP updates. This limits its capability of solving large POMDPs. One future direction is to investigate the properties of point-based value iteration as an approximation algorithm by itself. Another direction is to design efficient algorithms for standard DP updates in special models. We are currently exploring the latter direction.
Research is supported by Hong Kong Research Grants Council Grant HKUST6125/98E. The authors thank Tony Cassandra and Eric Hansen for sharing with us their programs. We are also grateful for the three anonymous reviewers who provided insightful comments and suggestions on an earlier version of this paper.