POMDPs model sequential decision making problems where effects of actions are nondeterministic and the state of the world is not known with certainty. They have attracted many researchers in Operations Research and Artificial Intelligence because of their potential applications in a wide range of areas (Monahan 1982, Cassandra 1998b), one of which is planning under uncertainty. Unfortunately, there is still a significant gap between this potential and actual applications, primarily due to the lack of effective solution methods. For this reason, much recent effort has been devoted to finding efficient algorithms for POMDPs (e.g., Parr and Russell 1995, Hauskrecht 1997b, Cassandra 1998a, Hansen 1998, Kaelbling et al. 1998, Zhang et al. 1999).
Value iteration is a well-known algorithm for POMDPs (Smallwood and Sondik 1973, Puterman 1990). It starts with an initial value function and iteratively performs dynamic programming (DP) updates to generate a sequence of value functions. The sequence converges to the optimal value function. Value iteration terminates when a predetermined convergence condition is met.
Value iteration performs typically a large number of DP updates before it converges and DP updates are notoriously expensive. In this paper, we develop a technique for reducing the number of DP updates.
DP update takes (the finite representation of) a value function as input and returns (the finite representation of) another value function. The output value function is closer to the optimal than the input value function. In this sense, we say that DP update improves its input. We propose an approximation to DP update called point-based DP update. Point-based DP update also improves its input, but possibly to a lesser degree than standard DP update. On the other hand, it is computationally much cheaper. During value iteration, we perform point-based DP update a number of times in between two standard DP updates. The number of standard DP updates can be reduced this way since point-based DP update improves its input. The reduction does not come with a high cost since point-based DP update takes little time.
The rest of this paper is organized as follows. In the next section we shall give a brief review of POMDPs and value iteration. The basic idea behind point-based DP update will be explained in Section 3. After some theoretical preparations in Section 4, we shall work out the details of point-based DP update in Section 5. Empirical results will be reported in Section 6 and possible variations evaluated in Section 7. Finally, we shall discuss related work in Section 8 and provide some concluding remarks in Section 9.