Description |
APRIORI provides a solution for a subtask of Association Rule Learning.
The subtask can be stated as follows:
Find all sets of items (itemsets) that have transaction support
above minimum support. The support for an itemset is the number
of transactions that contain the itemset. Itemsets with minimum support
are called large itemsets.
We use the following notation:
- Let I = {i1, i 2,...,im}
be a set of literals, called items.
- Let D be a set of transactions, where each transaction
T is a set of items such that T ⊆ I.
- Let X be a set of items in I.
A transaction T is said to contain X, if
X ⊆ T.
Discovering large itemsets
Discovering large itemsets requires making multiple passes over the data.
- In the first pass, we count the support of individual items and
determine which of them are large, i.e. have minimum support.
- Each subsequent pass starts with a seed set of itemsets found to be
large in the previous pass. This seed set is used for generating new
potentially large itemsets, called candidate itemsets.
- The actual support for these candidate itemsetsis counted during a
new pass over the data.
- At the end of the pass, we determine which of the candidate itemsets
are actually large. These itemsets become the seed for the next pass.
- This process continues until no large itemsets are found.
If the fraction of transactions belonging to large itemsets becomes small,
it will be more efficient to store their transaction IDs and just work on
that subset, than to perform passes over the complete dataset.
Algorithm APRIORI
Notation:
- k-itemset
An itemset having k items.
- Lk
Set of large k-itemsets (those with minimum support).
- Ck
Set of candidate k-itemsets (potentially large itemsets).
- L1={large 1-itemsets};
- for (k=2; Lk-1 ≠ ∅; k++)
do begin
-
Ck=apriori-gen(Lk-1);
// New candidates
-
forall transactions t ∈ D do begin
-
Ct = subset( Ck,t);
//Candidates contained in t
-
forall candidates c ∈Ct do
-
c.count++;
-
end
-
Lk = {c ∈ Ck | c.count ≥ minsup}
- end
- Answer=Uk Lk
The apriori-gen function takes Lk-1, the set of all large
(k-1)-itemsets, as an argument, and returns a set of candidates for
being large k-itemsets. It exploits the fact, that expanding an
itemset will reduce its support.
A k-itemset can only be large, if all of its
(k-1)-subsets are.
So apriori-gen generates only candidates with this property, which can
easily be achieved given the set Lk-1.
|