The Problem of Mining Association Rules

The problem of mining association rules may be stated formally as follows:

Discovery of Association Rule(s) Given a set of transactions T, find all the association rules having support ≥ minsup and confidence ≥ minconf, where minsup and minconf are the corresponding support and confidence thresholds. Definition 1

A brute-force approach for mining association rules is to compute the support and confidence for every possible rule. This approach is prohibitively expensive because there are exponentially many rules that can be extracted from a data set. Indeed, in most datasets, the vast majority of association rules are discarded after we apply reasonable thresholds for support (minsup) and confidence (minconf) – making the brute-force approach very inefficient in terms of the number of association rules yielded per computation.

To avoid performing needless computations, it would be useful to prune the rules early without having to compute their support and confidence values.

An initial step toward improving the performance of association rule mining algorithms is to decouple the support and confidence requirements. From Equation 2,

Support: Formula for Support - Ch 6 Equation 2

notice that the support of a rule X Y depends only on the support of its corresponding itemset, X Y .

For example, the following association rules have identical support because they involve items from the same itemset, {Visit to Site of Designer3, Visit to Site of Designer4, Visit to Site of Designer1}:

{Visit to Site of Designer3, Visit to Site of Designer4} → {Visit to Site of Designer1}

{Visit to Site of Designer1, Visit to Site of Designer3} → {Visit to Site of Designer4}

{Visit to Site of Designer1, Visit to Site of Designer4} → {Visit to Site of Designer3}

etc.

If the itemset is infrequent, then all candidate association rules that are derived from the itemset can be pruned immediately without our having to compute the confidence in the association rules.

Therefore, a common strategy adopted by many algorithms is to decompose the problem of mining association rule(s) into two major sub-tasks:

  1. Generating Frequent Itemset(s), whose objective is to find all the itemsets that satisfy the minsup.  These itemsets are called frequent itemsets.
  2. Generating Strong Association Rule(s), whose objective is to extract all the high-confidence rules from the frequent itemsets found in the previous step. These rules are called strong association rules.

The computational requirements for generating frequent itemset(s) are generally more expensive than those for generating strong association rule(s). See techniques for generating frequent itemsets and generating strong association rules in xxx.

Previous: Association Analysis – Basic Concepts and Algorithms

Next: Generation of Frequent Itemsets (FIs)  – Overview