Let’s consider how we may use the support measure to reduce the number of candidate itemsets that must be explored to generate **frequent itemsets** (i.e. where **support ≥ minsup**).

**Theorem 1 ( Apriori principle)** –

*If an itemset is frequent, then all of its subsets must also be frequent.*

Consider the itemset lattice shown in Figure 1. Suppose {*c,d,e*} is a **frequent** itemset. Clearly, any transaction that contains {c,d,e} must also contain its subsets, {*c,d*}, {*c,e*}, {*d,e*}, {*c*}, {*d*}, and {*e*}. As a result, if {c,d,e} is frequent, then all subsets of {c,d,e} – the shaded itemsets in Figure 1 – must also be frequent.

Figure 1. An illustration of the *Apriori* principle.

Conversely, if an itemset {a,b} is **infrequent**, then all of its supersets must be infrequent too. In Figure 2, the entire subgraph conctaining the supersets of {a,b} can be pruned immediately once {a,b} is found to be infrequent.

This strategy of trimming the exponential search space based on the support measure is known as **support-based pruning**. Such a pruning strategy is made possible by a key property of the support measure, namely, that the support for an itemset never exceeds the support for its subsets. This property is also known as the anti-monotone property of the support measure.

**Definition 1. (Monotonicity property)** – Let *I* be the set of items, and *J = 2 ^{I}* be the power set of

*I*. A measure

*f*is

**monotone**(or

*upward*closed) if

∀*X,Y ∈ J : (X ⊆ Y) → f(X) ≤ f(Y)*,

which means that if *X *is a subset of *Y*, then *f(X)* must not exceed *f(Y)*.

On the other hand, *f* is * anti-monotone* (or

*downward*closed) if

∀*X,Y ∈ J : (X ⊆ Y) → f(Y) ≤ f(X)*,

which means that if *X* is a subset of *Y*, then *f(Y)* must not exceed *f(X)*.

Any measure that possesses an **anti-monotone** property can be incorporated **directly** into the mining algorithm to prune the search space of candidate itemsets.

## Generating FIs using the *Apriori* algorithm

** Apriori** is the first Association Rule (AR) mining algorithm to use support-based pruning to address the exponential growth of candidate itemsets.

TID |
Items |

1 | {Bread, Milk} |

2 | {Bread, Diapers, Beer, Eggs} |

3 | {Bread, Diapers, Beer, Cola} |

4 | {Bread, Milk, Diapers, Beer} |

5 | {Bread, Milk, Diapers, Cola} |

Table 1. An example of market basket transactions.

Figure 2 provides a high-level illustration of *Apriori’s* generation of FIs for the transactions shown in Table 1.

We assume that the support threshold is 60%, which is equivalent to a minimum support count equal to 3.

Figure 2. Illustration of frequent itemset generation using the Apriori algorithm

In iteration 1, every item is considered as a candidate 1-itemset. After counting their supports, the candidate itemsets {**Cola**} and {**Eggs**} are discarded because they appear in fewer than three transactions.

In iteration 2, candidate 2-itemsets are generated using only the frequent 1-itemsets, because the *Apriori* principle ensures that all supersets of the infrequent 1-itemsets must be infrequent. Because there are only four frequent 1-itemsets, the number of candidate 2-itemsets generated by the algorith is

[latex]\binom{4}{2} = \frac{4!}{2!(4-2)!} = 6[/latex]

After counting their support values,two of these six candidate itemsets – {Beer, Bread} and {Beer, Milk} – turn out to be infrequent. The remaining four candidates are frequent, and are used to generate 3-itemsets in the next iteration.

In iteration 3, candidate 3-itemsets are generated using only the four frequent 2-itemsets, because the *Apriori* principle ensures that all supersets of the infrequent 2-itemsets must be infrequent.

Without support-based pruning, there are

[latex]\binom{6}{3} = \frac{6!}{3!(6-3)!} = 20[/latex]

candidate 3-itemsets that can be formed using the six items. With the *Apriori* principle, we only need to keep candidate 3-itemsets whose subsets are frequent. The only candidate that has this property is {**Bread, Diapers, Milk**}.

The efficiency of the Apriori pruning strategy can be shown by counting the number of candidate itemsets generated.

A **brute-force** strategy of enumerating all itemsets (up to size 3) as candidates will always generate

[latex]\binom{6}{1} +\binom{6}{2} +\binom{6}{3} = \frac{6!}{1!(6-1)!} +\frac{6!}{2!(6-2)!}+\frac{6!}{3!(6-3)!} = 6+15+20 = 41[/latex]

candidate itemsets.

The **Apriori pruning** strategy (applied to our data) produced

[latex]\binom{6}{1} +\binom{4}{2} + 1 = \frac{6!}{1!(6-1)!} +\frac{4!}{2!(4-2)!} = 6+6+1 = 13[/latex]

candidate itemsets – a 68% reduction in our case.

Algorithm 1 presents the psuedocode for the FI generation part of the Apriori algorithm. Let *C _{k}* denote the set of candidate

*k*-itemsets and

*F*denote the set of frequent

_{k}*k*-itemsets:

- The algorithm initially makes a single pass over the data set to determine the support of each item. Upon completion of this step, the set of all frequent 1-itemsets –
*F*– will be known (steps 1 and 2)._{1} - The algorithm iteratively generates new candidate
*k*-itemsets using the frequent (*k-1*)-itemsets found in the previous iteration (step 5). Candidate generation is implemented using a function called apriori-gen. - To count the support of the candidate itemsets, the algorithm needs to make an additional pass over the data set (steps 6 – 10). The subset function is used to determine all the candidate itemsets in
*C*that are contained in each transaction_{k}*t*. The implementation of this function is described in YYY. - After counting their supports, the algorithm eliminates all candidate itemsets whose support counts are less than
*minsup*(step 12). - The algorithm terminates when there are no new frequent itemsets generated, i.e.
*F*= Φ (step 13)._{k}

Algorithm 1. FI generation of the *Apriori* algorithm.

## Characteristics of *Apriori*‘s generation of FIs

The FI generation part of the *Apriori* algorithm has two important characteristics:

- It is a
**level-wise**algorithm – it traverses the itemset lattice one level at a time, from frequent 1-itemsets to the maximum size of FIs. - It employs a
**generate-and-test**strategy for finding FIs – at each iteration, new candidate itemsets are generated from the FIs found in the previous iteration. The support for each candidate itemset is then counted and tested against*minsup*.

The total number of iterations needed by the *Apriori* algorithm is *k _{max+1}*, where

*k*is the maximum size of the FIs.

_{max}Previous: In general – Generating Frequent Itemsets (FIs) – Overview

Next: Apriori algorithm – Generating and pruning candidate FIs