[Data Mining] Itemsets

Association Rule

Association Rules: Find all the rules X -> Y with minimum support and confidence

  • Support: s, probability that a transaction contains X and Y (minsup)
  • Confidence: c, conditional probability that a transaction having X also contains Y (minconf)
  • Interest: I -> j: difference between its confidence and the proportion of baskets that contain j

Pattern Growth Mining

[Goal] counting the frequent Itemsets with limited memory.

  1. The Apriori Algorithm

Monotonicity If a set of items I appears at least s times, so does every subset J of I.

  • Pass 1: Read baskets and count in main memory the occurrences of each individual item
  • Pass 2: Read baskets again and count in main memory only those pairs where both elements are frequent.

Problem: too many unfrequent candidates.

  1. FP Growth

Idea Recursively grow frequent patterns by database partition and finding frequent sub-patterns.

  • Use depth-first search
  • Avoid explicit candidate generation

Evaluation of Itemsets

  1. Interestingness Measure: Correlation

Measure of dependent/correlated events: lift

  1. Interestingness Measure: