#!/usr/bin/env python # coding: utf-8 # ## Association Rules Generation from Frequent Itemsets # Function to generate association rules from frequent itemsets # > from mlxtend.frequent_patterns import association_rules # ## Overview # Rule generation is a common task in the mining of frequent patterns. _An association rule is an implication expression of the form $X \rightarrow Y$, where $X$ and $Y$ are disjoint itemsets_ [1]. A more concrete example based on consumer behaviour would be $\{Diapers\} \rightarrow \{Beer\}$ suggesting that people who buy diapers are also likely to buy beer. To evaluate the "interest" of such an association rule, different metrics have been developed. The current implementation make use of the `confidence` and `lift` metrics. # ## References # # [1] Tan, Steinbach, Kumar. Introduction to Data Mining. Pearson New International Edition. Harlow: Pearson Education Ltd., 2014. (pp. 327-414). # ## Example 1 # The `generate_rules` takes dataframes of frequent itemsets as produced by the `apriori` function in *mlxtend.association*. To demonstrate the usage of the `generate_rules` method, we first create a pandas `DataFrame` of frequent itemsets as generated by the [`apriori`](./apriori.md) function: # # In[1]: import pandas as pd from mlxtend.preprocessing import OnehotTransactions from mlxtend.frequent_patterns import apriori dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Milk', 'Apple', 'Kidney Beans', 'Eggs'], ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'], ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']] oht = OnehotTransactions() oht_ary = oht.fit(dataset).transform(dataset) df = pd.DataFrame(oht_ary, columns=oht.columns_) frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True) frequent_itemsets # The `generate_rules()` function allows you to (1) specify your metric of interest and (2) the according threshold. Currently implemented measures are **confidence** and **lift**. Let's say you are interesting in rules derived from the frequent itemsets only if the level of confidence is above the 90 percent threshold (`min_threshold=0.9`): # In[2]: from mlxtend.frequent_patterns import association_rules association_rules(frequent_itemsets, metric="confidence", min_threshold=0.9) # ## Example 2 # If you are interested in rules fulfilling a different interest metric, you can simply adjust the parameters. E.g. if you are interested only in rules that have a lift score of >= 1.2, you would do the following: # In[3]: rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2) rules # Pandas `DataFrames` make it easy to filter the results further. Let's say we are ony interested in rules that satisfy the following criteria: # # 1. at least 2 antecedants # 2. a confidence > 0.75 # 3. a lift score > 1.2 # # We could compute the antecedent length as follows: # In[4]: rules["antecedant_len"] = rules["antecedants"].apply(lambda x: len(x)) rules # Then, we can use pandas' selection syntax as shown below: # In[5]: rules[ (rules['antecedant_len'] >= 2) & (rules['confidence'] > 0.75) & (rules['lift'] > 1.2) ] # ## API # In[1]: with open('../../api_modules/mlxtend.frequent_patterns/association_rules.md', 'r') as f: print(f.read()) # In[ ]: