Reward Normality

$FLTL is therefore quite expressive. Unfortunately, it is rather too expressive, in that it contains formulae which describe ``unnatural'' allocations of rewards. For instance, they may make rewards depend on future behaviours rather than on the past, or they may leave open a choice as to which of several behaviours is to be rewarded.⁵An example of dependence on the future is $\raisebox{0.6mm}{$\scriptstyle \bigcirc$}p \rightarrow \mbox{\$}$ , which stipulates a reward now if

is going to hold next. We call such formula reward-unstable. What a reward-stable

amounts to is that whether a particular prefix needs to be rewarded in order to make

true does not depend on the future of the sequence. An example of an open choice of which behavior to reward is $\mbox{$\Box$}(p \rightarrow \mbox{\$}) \vee \mbox{$\Box$}(\neg p \rightarrow \mbox{\$})$ which says we should either reward all achievements of the goal

or reward achievements of $\neg p$ but does not determine which. We call such formula reward-indeterminate. What a reward-determinate

amounts to is that the set of behaviours modelling

, i.e. $\{B \mid \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f\}$ , has a unique minimum. If it does not,

is insufficient (too small) to make

true. In investigating $FLTL [41], we examine the notions of reward-stability and reward-determinacy in depth, and motivate the claim that formulae that are both reward-stable and reward-determinate - we call them reward-normal - are precisely those that capture the notion of ``no funny business''. This is the intuition that we ask the reader to note, as it will be needed in the rest of the paper. Just for reference then, we define:

Definition 3 is reward-normal iff for every $\Gamma\in S^{\omega}$ and every $B \subseteq S^{*}$ , $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ iff for every , if $\Gamma(i) \in B_f$ then $\Gamma(i)\in B$ .

The property of reward-normality is decidable [41]. In Appendix A we give some simple syntactic constructions guaranteed to result in reward-normal formulae. While reward-abnormal formulae may be interesting, for present purposes we restrict attention to reward-normal ones. Indeed, we stipulate as part of our method that only reward-normal formulae should be used to represent behaviours. Naturally, all formulae in Section 3.3 are normal.

Sylvie Thiebaux 2006-01-20