All three translations into an MDP (PLTLSIM, PLTLMIN, and
PLTLSTR) rely on the equivalence
, with which we can
decompose temporal modalities into a requirement about the last state
of a sequence , and a requirement about the
prefix of the sequence. More precisely, given state
and a formula , one can compute in2
a new formula
called the regression of through . Regression has the property
that, for , is true of a finite sequence ending
with iff
is true of the prefix
. That is,
represents what must have been
true previously for to be true now. is defined as follows:
For instance, take a state in which holds and does not,
and take
, meaning that
must have been false 1 step ago, but that it must have held at some
point in the past and that must have held since last did.
, that is, for to hold
now, then at the previous stage, had to be false and the
requirement still had to hold. When and are both false in
, then
, indicating that cannot be
satisfied, regardless of what came earlier in the sequence.
For notational convenience, where is a set of
formulae we write for
.
Now the translations exploit the PLTL representation of rewards as
follows. Each expanded state (e-state)
in the generated MDP can be seen as labelled
with a set
of subformulae of the reward
formulae in (and their negations). The
subformulae in must be (1) true of the paths leading to the
e-state, and (2) sufficient to determine the current truth of all
reward formulae in , as this is needed to compute the current
reward. Ideally the s should also be (3) small enough to enable
just that, i.e. they should not contain subformulae which draw history
distinctions that are irrelevant to determining the reward at one
point or another. Note however that in the worst-case, the number of
distinctions needed, even in the minimal equivalent MDP, may be
exponential in . This happens for instance with the formula
, which requires additional bits of information
memorising the truth of over the last steps.
Sylvie Thiebaux
2006-01-20