Solving linear programs is the most expensive operation in
point-based update. An obvious way to speed up is to
avoid linear programs. Point-based update solves linear
programs and backs up on the belief points found so as to
guarantee uniform improvability. If the linear programs
are to be skipped, there must be some other way to guarantee
uniform improvability. There is an easy solution to this problem.
Suppose is the set of vectors that we try to update and
it is uniformly improvable. Let
be the set obtained
from
by backing up only on the witness points, which can be done
without solving linear programs. The set
might
or might not be uniformly improvable. However, the
union
is guaranteed to be uniformly improvable.
Therefore we can reprogram point-based update to return
the union in hope to reduce complexity.
The resulting variation will be called non-LP point-based DP
update.
Another way to reduce complexity is to simplify the backup operator
(Section 3.1) using the idea behind modified policy iteration
(e.g., Puterman 1990). When backing up from a set of vectors
at a belief point, the operator considers all possible actions
and picks the one that is optimal according to the
-improving
policy. To speed up, one can simply use the action found for
the belief point by the previous standard update.
The resulting operator will be called the MPI backup operator, where
MPI stands for modified policy iteration.
If
is the output of the previous standard update, the two actions
often are the same. However, they are usually different
if
is the result of several
point-based updates following the standard update.
Table 5 shows, for each test problem, the number of standard updates and the amount of time that VI1 took when non-LP point-based update was used (together with the standard backup operator). Comparing the statistics with those for point-based update (Tables 1 and 2), we see that the number of standard updates is increased on all test problems and the amount of time is also increased except for the first three problems. Here are the plausible reasons. First, it is clear that non-LP point-based update does not improve a set of vectors as much as point-based update. Consequently, it is less effective in reducing the number of standard updates. Second, although it does not solve linear programs, non-LP point-based update produces extraneous vectors. This means that it might need to deal with a large number of vectors at later iterations and hence might not be as efficient as point-based update after all.
4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network | Aircraft |
4 | 5 | 8 | 4 | 4 | 7 | 10 | 8 |
2.38 | 2.38 | 3.4 | .75 | .88 | 44 | 599 | 32,281 |
Extraneous vectors can be pruned. As a matter of fact, we did prune vectors that are pointwise-dominated by others (hence extraneous) in our experiments. This is inexpensive. Pruning of other extraneous vectors, however, requires the solution of linear programs and is expensive. In Zhang et al. (1999), we have discussed how this can be done the most efficient way. Still the results were not as good as those in Table 5. In that paper, we have also explored the combination of non-LP point-based update with the MPI backup operator. Once again, the results were not as good as those in Table 5. The reason is that the MPI backup operator further compromises the quality of point-based update.
The quality of
non-LP point-based update can be improved by using
the Gauss-Seidel asynchronous update (Denardo 1982).
Suppose we are updating a set . The idea is to, after
a vector is created by backup, add a copy of the vector to the set
right away. The hope is to increase the components of
later vectors.
We have tested this idea when preparing Zhang et al.
(1999) and found that the costs almost always
exceed the benefits.
A reason is that asynchronous update introduces
many more extraneous vectors than synchronous update.
In conclusion, point-based is conceptually simple and clean. When compared to its more complex variations, it seems to be the most effective in accelerating value iteration.