We use top down AO* search [24], in the
planner to
generate conformant and conditional plans. In the search graph, the
nodes are belief states and the hyper-edges are actions. We need AO*
because applying an action with observations to a belief state
divides the belief state into observational classes. We use
hyper-edges for actions because actions with observations have
several possible successor belief states, all of which must be
included in a solution.
The AO* search consists of two repeated steps: expand the current partial solution, and then revise the current partial solution. Search ends when every leaf node of the current solution is a belief state that satisfies the goal and no better solution exists (given our heuristic function). Expansion involves following the current solution to an unexpanded leaf node and generating its children. Revision is a dynamic programming update at each node in the current solution that selects a best hyper-edge (action). The update assigns the action with minimum cost to start the best solution rooted at the given node. The cost of a node is the cost of its best action plus the average cost of its children (the nodes connected through the best action). When expanding a leaf node, the children of all applied actions are given a heuristic value to indicate their estimated cost.
The main differences between our formulation of AO* and that of [24] are that we do not allow cycles in the search graph, we update the costs of nodes with an average rather than a summation, and use a weighted estimate of future cost. The first difference is to ensure that plans are strong (there are a finite number of steps to the goal), the second is to guide search toward plans with lower average path cost, and the third is to bias our search to trust the heuristic function. We define our plan quality metric (maximum plan path length) differently than the metric our search minimizes for two reasons. First, it is easier to compare to other competing planners because they measure the same plan quality metric. Second, search tends to be more efficient using the average instead of the maximum cost of an action's children. By using average instead of maximum, the measured cost of a plan is lower - this means that we are likely to search a shallower search graph to prove a solution is not the best solution.
Conformant planning, using actions without observations, is a special case for AO* search, which is similar to A* search. The hyper-edges that represent actions are singletons, leading to a single successor belief state. Consider the BTC problem (BTCS without the DetectMetal action) with the future cost (heuristic value) set to zero for every search node. We show the search graph in Figure 2 for this conformant example as well as a conditional example, described shortly. We can expand the initial belief state by progressing it over all applicable actions. We get:
Progress(
DunkP1)
{(inP1
inP2
clog
arm)
(
inP1
inP2
clog
arm)}
and
Progress(
DunkP2)
{(inP1
inP2
clog
arm)
(
inP1
inP2
clog
arm)}.
Since
clog already holds in every state of the
initial belief state, applying Flush to
leads to
creating a cycle. Hence, a hyper-edge for Flush is not added to the
search graph for
. We assign a cost of zero to
and
, update the internal nodes of our best solution, and add
DunkP1 to the best solution rooted at
(whose cost is now
one).
![]() |
We expand the leaf nodes of our best solution, a single node
, with all applicable actions. The only applicable action
is Flush, so we get:
Progress(
, Flush)
{(inP1
inP2
clog
arm)
(
inP1
inP2
clog
arm)}.
We assign a cost of zero to
and update our best
solution. We choose Flush as the best action for
(whose
cost is now one), and choose DunkP2 as the best action for
(whose cost is now one). DunkP2 is chosen for
because its
successor
has a cost of zero, as opposed to
which now has a cost of one.
Expanding the leaf node
with the only applicable action,
Flush, we get:
Progress(
, Flush)
{(
inP1
inP2
clog
arm)
(inP1
inP2
clog
arm)}.
We update
(to have cost zero) and
(to
have a cost of one), and choose Flush as the best action for
. The root node
has two children, each with cost
one, so we arbitrarily choose DunkP1 as the best action.
We expand
with the relevant actions to get
with the
DunkP2 action. DunkP1 creates a cycle back to
so it is
not added to the search graph. We now have a solution where all
leaf nodes are terminal. While it is only required that a terminal
belief state contains a subset of the states in
, in this case
the terminal belief state contains exactly the states in
. The
cost of the solution is three because, through revision,
has a cost of one, which sets
to a cost of two. However,
this means now that
has cost of three if its best action is
DunkP1. Instead, revision sets the best action for
to DunkP2
because its cost is currently two.
We then expand
with DunkP1 to find that its successor is
. DunkP2 creates a cycle back to
so it is not added
to the search graph. We now have our second valid solution because
it contains no unexpanded leaf nodes. Revision sets the cost of
to one,
to two, and
to three. Since all
solutions starting at
have equal cost (meaning there are now
cheaper solutions), we can terminate with the plan DunkP2, Flush,
DunkP1, shown in bold with dashed lines in Figure 2.
As an example of search for a conditional plan in
, consider
the BTCS example whose search graph is also shown in Figure
2. Expanding the initial belief state, we get:
Progress(
DunkP1),
Progress(
DunkP2),
and
= Progress
DetectMetal)
inP1
inP2
clog
arm,
inP1
inP2
clog
arm
.
Each of the leaf nodes is assigned a cost of zero, and
DunkP1 is chosen arbitrarily for the best solution rooted at
because the cost of each solution is identical. The cost of
including each hyper-edge is the average cost of its children plus
its cost, so the cost of using DetectMetal is (0+0)/2 + 1 = 1. Thus,
our root
has a cost of one.
As in the conformant problem we expand
, giving its child a
cost of zero and
a cost of one. This changes our best
solution at
to use DunkP2, and we expand
, giving
its child a cost of zero and it a cost of one. Then we choose
DetectMetal to start the best solution at
because it gives
a cost of one, where using either Dunk action would give
a cost of two.
We expand the first child of DetectMetal,
, with DunkP1 to
get:
{inP1
inP2
clog
arm},
which is a goal state, and DunkP2 to get:
= Progress
DunkP2) =
{inP1
inP2
clog
arm}.
We then expand the second child,
, with DunkP2
to get:
{
inP1
inP2
clog
arm},
which is also a goal state and DunkP1 to get:
= Progress
DunkP1) =
{
inP1
inP2
clog
arm}.
While none of these new belief states are not equivalent
to
, two of them entail
, so we can treat them as
terminal by connecting the hyper-edges for these actions to
.
We choose DunkP1 and DunkP2 as best actions for
and
respectively and set the cost of each node to one. This in
turn sets the cost of using DetectMetal for
to (1+1)/2 + 1 =
2. We terminate here because this plan has cost equal to the other
possible plans starting at
and all leaf nodes satisfy the
goal. The plan is shown in bold with solid lines in Figure
2.