Best Region Rules

Publication Zhang/99a: A Region-Based Approach to Discovering Temporal Structures in Data
Name Best Region Rules
Description The Best Region Rules algorithm addresses the task of predicting the occurence of future events on the basis of past events.

The hypotheses of the algorithm are specific rules:

Each rule contains a condition, a target and a lag interval. The interpretation of such a rule is, that target events occur with a delay, lying within the given time interval, after condition events. The rule A[5, 20] => B would e.g. have the interpretation "B occurs between 5 and 20 time units after occurences of A".

Measuring the Quality of a Rule

For specific applications, the intuitive quality measure of rules can vary, so the algorithm offers different measures, which are weighted by the user. The easiest of these measures are Prediction Accuracy and Recall Accuracy.

For a specific event sequence the Prediction Accuracy of a rule is defined as the ratio of

  • condition events, followed by a target event within the given intervall
  • to all condition events.

The Recall Accuracy is defined very similar. It measures the ratio of

  • target events following a condition event with a delay in the given interval,
  • to all target events.

These definitions basically reflect the usual meaning of accuracy and recall. Further measures reward beneficial properties of rules, like smaller interval regions, higher predictive power, or multiple hits (e.g. multiple target events matching a single condition event).

Finding Good Rules

The search for good rules is limited to Minimal Regions. With respect to a given lag set Minimal Regions are minimal lag intervals, covering the lags of a specific subset of event pairs. Limiting the search to such regions is resonable, considering that there are no negative examples in a event sequence, and that usually larger regions will be assigned higher quality scores.

The algorithm will first calculate all occuring time lags between different event types of a given event sequence.
If event A occurs at time points 5, 10 and 20 and event B occurs at 11, 15 and 21, then the resulting lag set is {1, 5, 6, 10, 11, 16}.

The algorithm will choose all minimal regions for generating rule candidates.

In the example above the rules with condition event A, target event B, and Minimal Regions are:
A[1, 1] => B, A[1, 5] => B, A[1, 6] => B, ..., A[1, 16] => B
A[5, 5] => B, A[5, 6] => B, ..., A[5, 16] => B
...
A[16, 16] => B

For learning models from event sequences the Best Region Rules algorithm can calculate the best Minimal Region for each event pair, according to a given quality measure (weights of the algorithm's measures). All best rules, exceeding a minimal quality score, form the hypothesis.

Dm Step Sequence Prediction
Method Type Algorithm