Association Rule
Association Rules: Find all the rules X -> Y with minimum support and confidence
- Support: s, probability that a transaction contains X and Y (
minsup
) - Confidence: c, conditional probability that a transaction having X also contains Y (
minconf
) - Interest: I -> j: difference between its confidence and the proportion of baskets that contain j
Pattern Growth Mining
[Goal] counting the frequent Itemsets with limited memory.
- The Apriori Algorithm
Monotonicity If a set of items I
appears at least s
times, so does every subset J
of I
.
- Pass 1: Read baskets and count in main memory the occurrences of each individual item
- Pass 2: Read baskets again and count in main memory only those pairs where both elements are frequent.
Problem: too many unfrequent candidates.
- FP Growth
Idea Recursively grow frequent patterns by database partition and finding frequent sub-patterns.
- Use depth-first search
- Avoid explicit candidate generation
Evaluation of Itemsets
- Interestingness Measure: Correlation
Measure of dependent/correlated events: lift
- Interestingness Measure: