|
Association Rule Learning
Publication |
Agrawal/Srikant/94a: Fast Algorithms for Mining Association Rules in Large Data Bases
|
Name |
Association Rule Learning |
Description |
Association rules describe correlation of events and can be regarded as
probabilistic rules. "Correlation of events" means, that events are frequently
observed together.
Discovering association rules in large databases can be a first step of knowledge
discovery in databases (KDD).
A good example from real life are databases of sales transactions, which have become an
important part of marketing infrastructure of many companies. Knowledge about sets of
items frequently bought together helps these companies to develop successful marketing
strategies.
|
Translations |
Assoziationsregellernen (German)
|
Definition |
The problem can be formally stated as follows:
-
Let I = {i1, i2, ...,
im} be a set of literals, called
items.
-
Let D be a set of transactions, where each
transaction T is a set of items such that
T ⊆ I.
-
Let X be a set of items in I.
A transaction T is said to contain
X, if X ⊆ T.
An association rule is an implication of the form X ⇒ Y, where
X ⊂ I ,Y ⊂ I and X ∩ Y = ∅ .
-
The rule X ⇒ Y holds with confidence c in transaction set D,
if and only if c% of transactions in D that contain X also contain Y.
-
The rule X ⇒ Y has support s in transaction set D,
if and only if s% of transactions in D contain X ∪ Y.
Given a set of transactions D, the task of mining association rules can be reformulated as
finding all association rules with at least minimum support (called minsup) and minimum
confidence (called minconf), where minsup and minconf are
user-specified values.
The task of discovering association rules can be decomposed into two subproblems:
- Find all sets of items (itemsets) that have transaction support above minimum
support. The support for an itemset is the number of transactions that contain the
itemset. Itemsets with minimum support are called large itemsets, all others are
called small itemsets. This subtask is addressed by the algorithm APRIORI.
-
Generate all assiciation rules with minimum support and confidence from the set of all large
itemsets. This subtask can be addressed by a straightforward algorithm:
- For each large itemset l, find all non-empty subsets.
- For each such subset a of l, output the rule a ⇒ (l-a),
iff the ratio of support(l) to support(a) is at least minconf .
|
Methods |
APRIORI
|
|
|