Common framework

The common framework underlying NMRDPP takes advantage of the fact that NMRDP solution methods can, in general, be divided into the distinct phases of preprocessing, expansion, and solving. The first two are optional. For PLTLSIM, preprocessing simply computes the set $\overline{\mbox{Sub}(F)}$ of subformulae of the reward formulae. For PLTLMIN, it also includes computing the labels $l(s)$ for each state $s$. For PLTLSTR, preprocessing involves computing the set $T$ of temporal variables as well as the ADDs for their dynamics and for the rewards. FLTL does not require any preprocessing. Expansion is the optional generation of the entire equivalent MDP prior to solving. Whether or not off-line expansion is sensible depends on the MDP solution method used. If state-based value or policy iteration is used, then the MDP needs to be expanded anyway. If, on the other hand, an anytime search algorithm or structured method is used, it is definitely a bad idea. In our experiments, we often used expansion solely for the purpose of measuring the size of the generated MDP. Solving the MDP can be done using a number of methods. Currently, NMRDPP provides implementations of classical dynamic programming methods, namely state-based value and policy iteration [32], of heuristic search methods: state-based LAO* [28] using either value or policy iteration as a subroutine, and of one structured method, namely SPUDD [29]. Prime candidates for future developments are (L)RTDP [9], symbolic LAO* [21], and symbolic RTDP [22].
Sylvie Thiebaux 2006-01-20