In the modified value iteration algorithm, the input to
standard DP update is always uniformly improvable. As such,
its output
dominates its input. This fact can be
used to simplify the computation of the Bellman residual.
As a matter of fact, the Bellman residual
reduces
.
To compute the latter quantity, one goes through the
vectors in one by one. For each vector, one solves
the linear program LP
.
The quantity is simply the maximum of the optimal values of
the objective functions of the linear programs.
Without uniformly improvability, we would have to repeat the process
one more time with the roles of
and
exchanged.