Association Rule Learning

Publication Agrawal/Srikant/94a: Fast Algorithms for Mining Association Rules in Large Data Bases
Name Association Rule Learning
Description

Association rules describe correlation of events and can be regarded as probabilistic rules. "Correlation of events" means, that events are frequently observed together. Discovering association rules in large databases can be a first step of knowledge discovery in databases (KDD).

A good example from real life are databases of sales transactions, which have become an important part of marketing infrastructure of many companies. Knowledge about sets of items frequently bought together helps these companies to develop successful marketing strategies.

Translations Assoziationsregellernen (German)
Definition The problem can be formally stated as follows:
  • Let I = {i1, i2, ..., im} be a set of literals, called items.
  • Let D be a set of transactions, where each transaction T is a set of items such that T ⊆ I.
  • Let X be a set of items in I. A transaction T is said to contain X, if X ⊆ T.

An association rule is an implication of the form X ⇒ Y, where X ⊂ I ,Y ⊂ I and X ∩ Y = ∅ .

  • The rule X ⇒ Y holds with confidence c in transaction set D, if and only if c% of transactions in D that contain X also contain Y.
  • The rule X ⇒ Y has support s in transaction set D, if and only if s% of transactions in D contain X ∪ Y.

Given a set of transactions D, the task of mining association rules can be reformulated as finding all association rules with at least minimum support (called minsup) and minimum confidence (called minconf), where minsup and minconf are user-specified values.

The task of discovering association rules can be decomposed into two subproblems:
  1. Find all sets of items (itemsets) that have transaction support above minimum support. The support for an itemset is the number of transactions that contain the itemset. Itemsets with minimum support are called large itemsets, all others are called small itemsets. This subtask is addressed by the algorithm APRIORI.


  2. Generate all assiciation rules with minimum support and confidence from the set of all large itemsets. This subtask can be addressed by a straightforward algorithm:
    • For each large itemset l, find all non-empty subsets.
    • For each such subset a of l, output the rule a ⇒ (l-a), iff the ratio of support(l) to support(a) is at least minconf .
Methods APRIORI