Computing the Bellman Residual

Next: Empirical Results and Discussions Up: Point-Based DP Update: The Previous: Convergence of Modified Value

Computing the Bellman Residual

In the modified value iteration algorithm, the input to standard DP update is always uniformly improvable. As such, its output dominates its input. This fact can be used to simplify the computation of the Bellman residual. As a matter of fact, the Bellman residual reduces .

To compute the latter quantity, one goes through the vectors in one by one. For each vector, one solves the linear program LP . The quantity is simply the maximum of the optimal values of the objective functions of the linear programs. Without uniformly improvability, we would have to repeat the process one more time with the roles of and exchanged.

Dr. Lian Wen Zhang
Thu Feb 15 14:47:09 HKT 2001