In the statistics-update procedure (Figure 14), and
are adjusted
for all rules that were active in the previous time step and proposed a partial command in
accordance with
(the last executed action).
Both and
are updated using a learning rate (
) computed using the MAM function,
which initially is 1, and consequently, the initial values of
and
have no influence on the future values of these variables. These initial values become
relevant when using a constant learning rate, as many existing reinforcement-learning
algorithms do.
If the observed effects of the last executed action agree with the current estimate interval
for the reward (), then the confidence index is increased by one unit. Otherwise, the
confidence index is decreased allowing a faster adaptation of the statistics to the last
obtained, surprising values of reward.