Description |
The Best Region Rules algorithm addresses the task of predicting the
occurence of future events on the basis of past events.
The hypotheses of the algorithm are specific rules:
Each rule contains a condition, a target and a lag interval. The
interpretation of such a rule is, that target events occur with a
delay, lying within the given time interval, after condition events.
The rule A[5, 20] => B would e.g. have the interpretation "B occurs
between 5 and 20 time units after occurences of A".
Measuring the Quality of a Rule
For specific applications, the intuitive quality measure of rules can
vary, so the algorithm offers different measures, which are weighted
by the user.
The easiest of these measures are Prediction Accuracy and Recall Accuracy.
For a specific event sequence the Prediction Accuracy of a rule
is defined as the ratio of
- condition events, followed by a target event within the
given intervall
- to all condition events.
The Recall Accuracy is defined very similar. It measures the
ratio of
- target events following a condition event with
a delay in the given interval,
- to all target events.
These definitions basically reflect the usual meaning of accuracy and
recall. Further measures reward beneficial properties of rules, like
smaller interval regions, higher predictive power, or multiple hits
(e.g. multiple target events matching a single condition event).
Finding Good Rules
The search for good rules is limited to Minimal Regions.
With respect to a given lag set Minimal Regions are minimal lag intervals,
covering the lags of a specific subset of event pairs.
Limiting the search to such regions is resonable, considering that there are no negative
examples in a event sequence, and that usually larger regions will be
assigned higher quality scores.
The algorithm will first calculate all occuring time lags between different
event types of a given event sequence.
If event A occurs at time points 5, 10 and 20 and event B occurs at
11, 15 and 21, then the resulting lag set is
{1, 5, 6, 10, 11, 16}.
The algorithm will choose all minimal regions for generating rule candidates.
In the example above the rules with condition event A, target event B, and
Minimal Regions are:
A[1, 1] => B, A[1, 5] => B, A[1, 6] => B, ..., A[1, 16] => B
A[5, 5] => B, A[5, 6] => B, ..., A[5, 16] => B
...
A[16, 16] => B
For learning models from event sequences the Best Region Rules algorithm
can calculate the best Minimal Region for each event pair, according to a
given quality measure (weights of the algorithm's measures).
All best rules, exceeding a minimal quality score, form the hypothesis.
|