In this appendix, we describe in detail the approach described in the main body of the paper.
![]() |
The partial-rule learning algorithm (whose top level form is shown in Figure 11) stores the following information for each partial rule
To estimate the confidence on and
we use a
confidence index
that, roughly speaking, keeps track of the number of times the partial
rule is used. The confidence is derived
from
using a confidence_function in the following way:
Additionally, the confidence index is used to define the learning rate (i.e., the weight of new observed rewards in the statistics update). For this purpose we implement a MAM function [Venturini, 1994] for each rule:
Using a MAM-based updating rule, we have that, the lower the confidence, the higher the effect of the last observed rewards on the statistics, and the faster the adaptation of the statistics. This adaptive learning rate strategy is related to those presented by [Sutton, 1991] and by [Kaelbling, 1993], and contrasts with traditional reinforcement-learning algorithms where a constant learning rate is used.
After the initialization phase, the algorithm enters in a continuous loop for each task episode consisting on estimating the possible effects of all actions, executing the most promising one, and updating the system so that its performance improves in the future. The system update includes the statistics update and the partial-rule management.