The simplest procedure to get the estimated value for actions is a brute-force approach consisting of the independent evaluation of each one of them. In simple cases, this approach would be enough but, when the number of valid combinations of elementary actions (i.e., of actions) is large, the separate evaluation of each action would take long time, increasing the time of each robot decision and decreasing the reactivity of the control. To avoid this, Appendix B presents a more efficient procedure to get the value of any action.
Figure 13 summarizes the action-evaluation procedure using partial rules. The reward for each action is guessed using the most relevant rule for this action (i.e., the winner rule). This winner rule is computed as
![]() |
The reward estimation using the winner rule is selected at random (uniformly) from the interval
![]() |
![]() |
Josep M Porta 2005-02-01