In general – Generating Frequent Itemsets (FIs)

A lattice structure can be used to enumerate the list of all possible itemsets. Figure 1 shows an itemset lattice for I = {a,b,c,d}. In general, a data set that contains k items can generate up to 2k-1 frequent itemsets, excluding the null set.

An itemset lattice

Figure 1. An itemset lattice.

A brute-force approach for finding frequent itemsets is to determine the support count for every candidate itemset in the lattice structure. To do this, we need to compare each candidate itemset against every transaction (see Figure 2). If the candidate is contained in a transaction, its support count will be incremented. E.g. the support for {Bread, Milk} is incremented three times, because the itemset is contained in transactions 1, 4, and 5. Such an approach can be very expensive, because it requires O(NMw) comparisons, where N is the number of transactions, M – 2k-1 is the number of candidate itemsets, and w is the maximum transaction width.

Counting the support of candidate itemsets

Figure 2. Counting the support of candidate itemsets.

There are several ways to reduce the computational complexity of frequent itemset generation:

  1. Reduce the number of candidate itemsets (M) – the Apriori principle (see below) is an effective way to eliminate some of the candidate itemsets without counting their support values.
  2. Reduce the number of comparisons – instead of matching every candidate itemset against every transaction, we can reduce the number of comparisons by using more advanced data structures, either to store the candidate itemsets or to compress the data sets (see below).

Previous: The problem of mining Association Rules – ARs
Next: In general – Generating Association Rules (ARs) or Apriori algorithm – Overview