APRIORI

Publication Agrawal/Srikant/94a: Fast Algorithms for Mining Association Rules in Large Data Bases
Name APRIORI
Description

APRIORI provides a solution for a subtask of Association Rule Learning.

The subtask can be stated as follows:

Find all sets of items (itemsets) that have transaction support above minimum support. The support for an itemset is the number of transactions that contain the itemset. Itemsets with minimum support are called large itemsets.

We use the following notation:

  • Let I = {i1, i 2,...,im} be a set of literals, called items.
  • Let D be a set of transactions, where each transaction T is a set of items such that T ⊆ I.
  • Let X be a set of items in I. A transaction T is said to contain X, if X ⊆ T.

Discovering large itemsets

Discovering large itemsets requires making multiple passes over the data.

  • In the first pass, we count the support of individual items and determine which of them are large, i.e. have minimum support.
  • Each subsequent pass starts with a seed set of itemsets found to be large in the previous pass. This seed set is used for generating new potentially large itemsets, called candidate itemsets.
  • The actual support for these candidate itemsetsis counted during a new pass over the data.
  • At the end of the pass, we determine which of the candidate itemsets are actually large. These itemsets become the seed for the next pass.
  • This process continues until no large itemsets are found.
If the fraction of transactions belonging to large itemsets becomes small, it will be more efficient to store their transaction IDs and just work on that subset, than to perform passes over the complete dataset.

Algorithm APRIORI

Notation:

  • k-itemset An itemset having k items.
  • Lk Set of large k-itemsets (those with minimum support).
  • Ck Set of candidate k-itemsets (potentially large itemsets).

  1. L1={large 1-itemsets};
  2. for (k=2; Lk-1 ≠ ∅; k++) do begin
  3. Ck=apriori-gen(Lk-1); // New candidates
  4. forall transactions t ∈ D do begin
  5. Ct = subset( Ck,t); //Candidates contained in t
  6. forall candidates c ∈Ct do
  7. c.count++;
  8. end
  9. Lk = {c ∈ Ck | c.count ≥ minsup}
  10. end
  11. Answer=Uk Lk

The apriori-gen function takes Lk-1, the set of all large (k-1)-itemsets, as an argument, and returns a set of candidates for being large k-itemsets. It exploits the fact, that expanding an itemset will reduce its support. A k-itemset can only be large, if all of its (k-1)-subsets are. So apriori-gen generates only candidates with this property, which can easily be achieved given the set Lk-1.

Dm Step Association Rule Learning
Method Type Algorithm