Experiments have been conducted to empirically determine the effectiveness of point-based DP update in speeding up value iteration. Eight problems are used in the experiments. In the literature, the problems are commonly referred to as 4x3CO, Cheese, 4x4, Part Painting, Tiger, Shuttle, Network, and Aircraft ID. We obtained the problem files from Tony Cassandra. Information about their sizes is summarized in the following table.
Problem
Problem
4x3CO 11 4 11
Cheese 11 4 7
4x4 16 2 4
Painting 4 4 2
Tiger 2 2 3
Shuttle 8 2 3
Network 7 2 4
Aircraft ID 12 5 6
The effectiveness of point-based DP update is determined
by comparing the standard value iteration
algorithm VI and the modified
value iteration algorithm VI1.
The implementation of standard value iteration used
in our experiments is
borrowed from
Hansen. Modified value iteration is implemented on top of Hansen's
code.
The discount factor is set at 0.95 and
round-off precision is set at
.
All experiments are
conducted on an UltraSparc II machine.
Table 1 shows the amounts of time VI and VI1 took to compute 0.01-optimal policies for the test problems. We see that VI1 is consistently more efficient than VI, especially on the larger problems. It is about 1.3, 2.8, 5, 62, 141, 173, and 49 times faster than VI on the first seven problems respectively. For the Aircraft ID problem, VI1 was able to compute a 0.01-optimal policy in less than 8 hours, while VI was only able to produce a 33-optimal policy after 40 hours.
4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network | Aircraft | |
VI | 3.2 | 13.9 | 27.15 | 37.84 | 79.14 | 5,199 | 12,478 | - |
VI1 | 2.4 | 5.0 | 5.30 | .61 | .56 | 30 | 253 | 27,676 |
Various other statistics are given in Table 2 to highlight
computational properties of VI1 and to explain its superior performance.
The numbers of standard DP updates
carried out by VI and VI1 are shown at rows 1 and 3.
We see that
VI1 performed no more than 5 standard updates
on the test problems, while
VI performed more than 125.
This indicates that point-based update is very effective
in cutting down the number of standard updates required to
reach convergence.
As a consequence, VI1 spent much less time than VI
in standard updates (row 2 and 4).
Problem | 4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network | |
DPU # | 125 | 129 | 130 | 127 | 163 | 174 | 214 | |
1.5ex[0cm][0cm]VI | Time | 2.00 | 7.63 | 17.83 | 33.39 | 70.44 | 3,198 | 8,738 |
DPU # | 4 | 4 | 3 | 3 | 3 | 5 | 5 | |
Time | .05 | .09 | .15 | .21 | .09 | 13 | 82 | |
1.5ex[0cm][0cm]VI1 | PBDPU # | 377 | 219 | 173 | 244 | 515 | 455 | 670 |
Time | 2.32 | 4.86 | 5.09 | .37 | .45 | 10 | 139 | |
Quality Ratio | .33 | .58 | .74 | .51 | .31 | 0.31 | .32 | |
Complexity Ratio | .38 | .37 | .21 | .0057 | .002 | .0012 | .005 |
Row 5 shows the numbers of point-based updates carried out by
VI1. We see that those numbers are actually larger than the numbers of
standard updates performed by VI. This is expected.
To see why, recall that point-based update is an approximation of
standard update. Let be a set of vectors that is
uniformly improvable.
Use
to
denote the sets of vectors resulted from performing
point-based update on
.
For any belief state b,
we have
.
This means that point-based update improves
but not as
much as standard update. Consequently, the use of
point-based update increases the total number of
iterations, i.e the number of standard updates plus the number of
point-based updates.
Intuitively, the better point-based update is as an approximation of standard update, the less the difference between the total number iterations that VI1 and VI need take. So, the ratio between those two numbers in a problem can be used, to certain extent, as a measurement of the quality of point-based update in that problem. We shall refer to it as the quality ratio of point-based update. Row 7 shows the quality ratios in the seven test problems. We see that the quality of point-based update is fairly good and stable across all the problems.
Row 8 shows, for each test problem, the ratio between the average time of a standard update performed by VI and that of a point-based update performed by VI1. Those ratios measure, to certain extent, the complexity of point-based update relative to standard update and hence will be referred to as the complexity ratios of point-based update. We see that, as predicted by the analysis in Section 5.3, point-based update is consistently less expensive than standard update. The differences are more than 200 times in the last four problems.
In summary, the statistics suggest that the quality of point-based update relative to standard update is fairly good and stable and its complexity is much lower. Together with the fact that point-based update can drastically reduces the number of standard updates, those explain the superior performance of VI1.
To close this section, let us note
that while VI finds policies with quality very close
to the predetermined criterion, VI1 usually finds much better
ones (Table 3). This is because VI checks policy quality
after each (standard) update, while VI1 does not do this
after point-based updates.
Problem | 4x3CO | Cheese | 4x4 | Paint | Tiger | Shuttle | Network |
VI | .0095 | .0099 | .0099 | .01 | .0098 | .0097 | .0098 |
VI1 | .0008 | .0008 | .0009 | .0007 | .0007 | .00015 | .001 |