APRIORI

Publication Agrawal/Srikant/94a: Fast Algorithms for Mining Association Rules in Large Data Bases

Name APRIORI

Description
APRIORI provides a solution for a subtask of Association Rule Learning.

The subtask can be stated as follows:
Find all sets of items (itemsets) that have transaction support above minimum support. The support for an itemset is the number of transactions that contain the itemset. Itemsets with minimum support are called large itemsets.

We use the following notation:

Let I = {i₁, i ₂,...,i_m} be a set of literals, called items.

Let D be a set of transactions, where each transaction T is a set of items such that T ⊆ I.

Let X be a set of items in I. A transaction T is said to contain X, if X ⊆ T.

Discovering large itemsets

Discovering large itemsets requires making multiple passes over the data.

In the first pass, we count the support of individual items and determine which of them are large, i.e. have minimum support.

Each subsequent pass starts with a seed set of itemsets found to be large in the previous pass. This seed set is used for generating new potentially large itemsets, called candidate itemsets.

The actual support for these candidate itemsetsis counted during a new pass over the data.

At the end of the pass, we determine which of the candidate itemsets are actually large. These itemsets become the seed for the next pass.

This process continues until no large itemsets are found.

If the fraction of transactions belonging to large itemsets becomes small, it will be more efficient to store their transaction IDs and just work on that subset, than to perform passes over the complete dataset.

Algorithm APRIORI

Notation:

k-itemset An itemset having k items.

L_k Set of large k-itemsets (those with minimum support).

C_k Set of candidate k-itemsets (potentially large itemsets).

L₁={large 1-itemsets};

for (k=2; L_k-1 ≠ ∅; k++) do begin

C_k=apriori-gen(L_k-1); // New candidates

forall transactions t ∈ D do begin

C_t = subset( C_k,t); //Candidates contained in t

forall candidates c ∈C_t do

c.count++;

end

L_k = {c ∈ C_k | c.count ≥ minsup}

end

Answer=U_k L_k

The apriori-gen function takes L_k-1, the set of all large (k-1)-itemsets, as an argument, and returns a set of candidates for being large k-itemsets. It exploits the fact, that expanding an itemset will reduce its support. A k-itemset can only be large, if all of its (k-1)-subsets are. So apriori-gen generates only candidates with this property, which can easily be achieved given the set L_k-1.

Dm Step Association Rule Learning

Method Type Algorithm