This section discusses how we can use planning graph heuristics to measure belief state distances. We cover several types of planning graphs and the extent to which they can be used to compute various heuristics. We begin with a brief background on planning graphs.
Planning Graphs: Planning graphs serve as the basis for our
belief state distance estimation. Planning graphs were initially
introduced in GraphPlan [4] for representing an
optimistic, compressed version of the state space progression tree.
The compression lies in unioning the literals from every state at
subsequent steps from the initial state. The optimism relates to
underestimating the number of steps it takes to support sets of
literals (by tracking only a subset of the infeasible tuples of
literals). GraphPlan searches the compressed progression (or
planning graph) once it achieves the goal literals in a level with
no two goal literals marked infeasible. The search tries to find
actions to support the top level goal literals, then find actions to
support the chosen actions and so on until reaching the first graph
level. The basic idea behind using planning graphs for search
heuristics is that we can find the first level of a planning graph
where a literal in a state appears; the index of this level is a
lower bound on the number of actions that are needed to achieve a
state with the literal. There are also techniques for estimating the
number of actions required to achieve sets of literals. The
planning graphs serve as a way to estimate the reachability of state
literals and discriminate between the ``goodness'' of different
search states. This work generalizes such literal estimations to
belief space search by considering both GraphPlan and CGP style
planning graphs plus a new generalization of planning graphs, called
the
.
Planners such as CGP [30] and SGP [31] adapt the GraphPlan idea of compressing the search space with a planning graph by using multiple planning graphs, one for each possible world in the initial belief state. CGP and SGP search on these planning graphs, similar to GraphPlan, to find conformant and conditional plans. The work in this paper seeks to apply the idea of extracting search heuristics from planning graphs, previously used in state space search [23,18,5] to belief space search.
Planning Graphs for Belief Space: This section proceeds by
describing four classes of heuristics to estimate belief state
distance
and
.
heuristics are techniques
existing in the literature that are not based on planning graphs,
heuristics are techniques based on a single classical planning
graph,
heuristics are techniques based on multiple planning
graphs (similar to those used in CGP) and
heuristics use a new
labelled planning graph. The
combines the advantages of
and
to reduce the representation size and maintain
informedness. Note that we do not include observations in any of
the planning graph structures as SGP [31] would, however we
do include this feature for future work. The conditional planning
formulation directly uses the planning graph heuristics by ignoring
observations, and our results show that this still gives good
performance.
![]() |
In Figure 4 we present a taxonomy of distance measures
for belief space. The taxonomy also includes related planners,
whose distance measures will be characterized in this section. All
of the related planners are listed in the
group, despite the
fact that some actually use planning graphs, because they do not
clearly fall into one of our planning graph categories. The figure
shows how different substrates (horizontal axis) can be used to
compute belief state distance by aggregating state to state
distances under various assumptions (vertical axis). Some of the
combinations are not considered because they do not make sense or
are impossible. The reasons for these omissions will be discussed
in subsequent sections. While there are a wealth of different
heuristics one can compute using planning graphs, we concentrate on
relaxed plans because they have proven to be the most effective in
classical planning and in our previous studies [9].
We provide additional descriptions of other heuristics like max,
sum, and level in Appendix A.
Example: To illustrate the computation of each heuristic, we
use an example derived from BTC called Courteous BTC (CBTC) where a
courteous package dunker has to disarm the bomb and leave the toilet
unclogged, but some discourteous person has left the toilet clogged.
The initial belief state of CBTC in clausal representation is:
arm
clog
(inP1
inP2)
(
inP1
inP2),
and the goal is:
clog
arm.
The optimal action sequences to reach
from
are:
Flush, DunkP1, Flush, DunkP2, Flush,
and
Flush, DunkP2, Flush, DunkP1, Flush.
Thus the optimal heuristic estimate for the distance
between
and
, in regression, is
= 5
because in either plan there are five actions.
We use planning graphs for both progression and regression search.
In regression search the heuristic estimates the cost of the current
belief state w.r.t. the initial belief state and in progression
search the heuristic estimates the cost of the goal belief state
w.r.t. the current belief state. Thus, in regression search the
planning graph(s) are built (projected) once from the possible
worlds of the initial belief state, but in progression search they
need to be built at each search node. We introduce a notation
to denote the belief state for which we find a heuristic
measure, and
to denote the belief state that is used to
construct the initial layer of the planning graph(s). In the
following subsections we describe computing heuristics for
regression, but they are generalized for progression by changing
and
appropriately.
In the previous section we discussed two important issues involved
in heuristic computation: sampling states to include in the
computation and using mutexes to capture negative interactions in
the heuristics. We will not directly address these issues in this
section, deferring them to discussion in the respective empirical
evaluation sections, 6.4 and 6.2. The heuristics below are computed
once we have decided on a set of states to use, whether by sampling
or not. Also, as previously mentioned, we only consider sampling
states from the belief state
because we can implicitly find
closest states from
without sampling. We only explore
computing mutexes on the planning graphs in regression search. We
use mutexes to determine the first level of the planning graph where
the goal belief state is reachable (via the level heuristic
described in Appendix A) and then extract a relaxed plan starting at
that level. If the level heuristic is
because there is no
level where a belief state is reachable, then we can prune the
regressed belief state.
We proceed by describing the various substrates used for computing belief space distance estimates. Within each we describe the prospects for various types of world aggregation. In addition to our heuristics, we mention related work in the relevant areas.