Reward Normality

$FLTL is therefore quite expressive. Unfortunately, it is rather too expressive, in that it contains formulae which describe ``unnatural'' allocations of rewards. For instance, they may make rewards depend on future behaviours rather than on the past, or they may leave open a choice as to which of several behaviours is to be rewarded.5An example of dependence on the future is $\raisebox{0.6mm}{$\scriptstyle \bigcirc$}p \rightarrow \mbox{\$}$, which stipulates a reward now if $p$ is going to hold next. We call such formula reward-unstable. What a reward-stable $f$ amounts to is that whether a particular prefix needs to be rewarded in order to make $f$ true does not depend on the future of the sequence. An example of an open choice of which behavior to reward is $\mbox{$\Box$}(p \rightarrow \mbox{\$}) \vee \mbox{$\Box$}(\neg p \rightarrow \mbox{\$})$ which says we should either reward all achievements of the goal $p$ or reward achievements of $\neg p$ but does not determine which. We call such formula reward-indeterminate. What a reward-determinate $f$ amounts to is that the set of behaviours modelling $f$, i.e. $\{B \mid \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f\}$, has a unique minimum. If it does not, $B_f$ is insufficient (too small) to make $f$ true. In investigating $FLTL [41], we examine the notions of reward-stability and reward-determinacy in depth, and motivate the claim that formulae that are both reward-stable and reward-determinate - we call them reward-normal - are precisely those that capture the notion of ``no funny business''. This is the intuition that we ask the reader to note, as it will be needed in the rest of the paper. Just for reference then, we define:

Definition 3   $f$ is reward-normal iff for every $\Gamma\in S^{\omega}$ and every $B \subseteq S^{*}$, $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ iff for every $i$, if $\Gamma(i) \in B_f$ then $\Gamma(i)\in B$.

The property of reward-normality is decidable [41]. In Appendix A we give some simple syntactic constructions guaranteed to result in reward-normal formulae. While reward-abnormal formulae may be interesting, for present purposes we restrict attention to reward-normal ones. Indeed, we stipulate as part of our method that only reward-normal formulae should be used to represent behaviours. Naturally, all formulae in Section 3.3 are normal.
Sylvie Thiebaux 2006-01-20