Experimental Results

Next: Improving Case Storage and Up: An Empirical Evaluation of Previous: Experimental Setup

Experimental Results

The results of the experiments are shown in Tables 1 and 2.

**Table 1:** Performance statistics in $\theta _2 D^m S^1$ and Logistics Transportation Domain (Average solution length is shown in parentheses next to %Solved for the logistics domain only)
		$\theta _2 D^m S^1$			Logistics
Phase	Learning	Static	Scratch	Learning	Static	Scratch
(1) Two Goal
%Solved	100%	100%	100%	100% (6.0)	100% (6.0)	100% (6.0)
nodes	90	240	300	1773	1773	2735
time(sec)	1	4	2	30	34	56
(2) Three Goal
% Solved	100%	100%	100%	100% (8.2)	100% (8.2)	100% (8.2)
nodes	120	810	990	6924	13842	20677
time(sec)	2	15	8	146	290	402
(3) Four Goal
% Solved	100%	100%	100%	100% (10.3)	100% (10.3)	100% (10.3)
nodes	150	2340	2533	290	38456	127237
time(sec)	3	41	21	32	916	2967

Each table entry represents cumulative results obtained from the sequence of 30 problems corresponding to one phase of the experiment. The first row of Table 1 shows the percentage of problems correctly solved within the time limit (550 seconds). The average solution length is shown in parentheses for the logistics domain (solution length was omitted in $\theta _2 D^m S^1$ since all of the problems generated within a phase have the same solution length). The second and third rows of Table 1 contain respectively the total number of search nodes visited for all of the 30 test problems, and the total CPU time (including case retrieval time).

These results are also summarized in Figure 10.

**Figure 10:** Replay performance in the $\theta _2 D^m S^1$ and Logistics Transportation domain.
$\begin{figure} \begin{center} \begin{tabular} {cc\vert cc} \subfigure[$ \theta ... .../ud/ai1/laurie/figs/jair-figure10b.epsf} }\end{tabular}\end{center}\end{figure}$

DERSNLP+EBL in learning mode was able to solve as many of the multi-goal problems as in the other two modes and did so in substantially less time. Case retrieval based on case failure resulted in performance improvements which increased with problem size. Comparable improvements were not found when retrieval was based on the static similarity metric alone. This should not be surprising since cases were retrieved that had experienced at least one earlier failure. This meant that testing was done on cases that had some likelihood of failing if retrieval was based on the static metric.

**Table 2:** Measures of effectiveness of replay.
	$\theta _2 D^m S^1$		Logistics
Phase	Learning	Static	Learning	Static
Two Goal
% Seq	100%	0%	53%	53%
% Der	60%	0%	48%	48%
% Rep	100%	0%	85%	85%
Three Goal
% Seq	100%	0%	80%	47%
% Der	70%	0%	63%	50%
% Rep	100%	0%	89%	72%
Four Goal
% Seq	100%	0%	100%	70%
% Der	94%	0%	79%	62%
% Rep	100%	0%	100%	81%

Table 2 records three different measures which reflect the effectiveness of replay. The first is the percentage of sequenced replay. Recall that replay of a trace is considered here to be sequenced if the skeletal plan is further refined to reach a solution to the new problem. The results point to the greater efficiency of replay in learning mode. In the $\theta _2 D^m S^1$ domain, replay was entirely sequenced in this mode. In the transportation domain, retrieval based on failure did not always result in sequenced replay, but did so more often than in static mode.

The greater effectiveness of replay in learning mode is also indicated by the two other measures contained in the subsequent two rows of Table 2. These are respectively, the percentage of plan refinements on the final derivation path that were formed through guidance from replay (% Der), and the percentage of the total number of plans created through replay that remain in the final derivation path (% Rep). The case-based planner in learning mode showed as much or greater improvements according to these measures, demonstrating the relative effectiveness of guiding retrieval through a learning component based on replay failures. These results indicate that DERSNLP+EBL's integration of CBP and EBL is a promising approach when extra interacting goals hinder the success of replay.

In Section 4 we report on a more thorough evaluation of DERSNLP+EBL's learning component. This was conducted with the purpose of investigating if learning from case failure is of benefit for a planner solving random problems in a complex domain. For this evaluation we implemented the full case-based planning system along with novel case storage and adaptation strategies. In the next section, we describe the storage strategy that was developed for this evaluation.

Next: Improving Case Storage and Up: An Empirical Evaluation of Previous: Experimental Setup

11/5/1997