This procedure (Figure 15) includes the generation of new partial rules and the removal of previously generated ones that proved to be useless.
In our
implementation, we apply a heuristic that produces the generation of new partial rules when the
reward prediction error exceeds
. In this way, we concentrate our efforts to
improve the categorization on those situation with larger errors in the reward prediction.
Every time a wrong prediction is made, at most new partial rules are generated by
combination of pairs of rules included in the set
. Recall that this set
includes the rules active in the previous time step and in accordance with the executed action
.
Thus, these are the rules related with the situation-action whose
reward prediction we need to improve.
The combination of two partial rules
consists of a new
partial rule with a partial view that includes all the features included in the partial views
of either
or
and with a partial command that includes all the elementary actions
of the partial commands of either
or
.
In other words, the feature set of
is the union of the feature sets in
and in
and the elementary actions in
are
the union of those in
and those in
. Note that, since both
and
are in
, they have been simultaneously active and they are in accordance
with the same action and, thus, they can not be incompatible (i.e., they can not include
inconsistent features or elementary actions).
![]() |
In the partial-rule creation, we bias our system to favor the combination of those rules ()
whose reward prediction (
) is closer to the observed one (
). Finally, the
generation of rules lexicographically equivalent to already existing ones is not
allowed.
According to the categorizability assumption, only low-order partial rules are required to
achieve the task at hand. For this reason, to improve efficiency, we limit the number of
partial rules to a maximum of . However, our partial-rule generation procedure is always
generating new rules (concentrating on those situations with larger error). Therefore, when we
need to create new rules and there is no room for them, we must eliminate the less useful
partial rules.
A partial rule can be removed if its reward prediction is too similar to some other rule in the same situations.
The similarity between two rules can be measured using the normalized degree of intersection between their reward distributions and the number of times both rules are used simultaneously:
![]() |
The
similarity assessment
for any pair of partial rules in the controller is too expensive and, in general,
determining the similarity of each rule with respect to those from which it was generated
(that are the rules we tried to refine when the new rule was created) is sufficient.
Thus, based on the above similarity measure, we define the redundancy of a partial
rule
as:
![]() |
Observe that with
, we have that
and
. Therefore
![]() |
![]() |
When we need to create new rules but the maximum number of rules () has been
reached, the partial rules with a redundancy above a
given threshold (
) are eliminated. Since the redundancy of a partial rule can only be
estimated after observing it a number of times, the redundancy of the partial rules with low
confidence indexes are set to 0, so that they are not immediately removed after creation.
Observe that, to compute the redundancy of a rule , we use the partial rules from which
was derived. For this reason, a rule
cannot be removed from a controller
if
there exists any rule
such that
. Additionally, in this way we eliminate first
the useless rules with higher order.
Josep M Porta 2005-02-01