Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
How to Choose Minimum Support/Confidence Values | Mining Frequent Itemsets
Association Rule Mining
course content

Course Content

Association Rule Mining

Association Rule Mining

1. Introduction to Association Rule Mining
2. Mining Frequent Itemsets
3. Additional Applications of ARM

book
How to Choose Minimum Support/Confidence Values

Choosing appropriate minimum support and confidence values is crucial when mining association rules from transactional datasets. The support and confidence thresholds determine the strength and relevance of the discovered rules. Selecting these values requires a balance between capturing meaningful associations and avoiding an overwhelming number of trivial rules.

Factors influencing minimum threshold selection

  • Dataset Size: Large datasets may require lower support thresholds to capture meaningful associations due to the increased variability in item occurrences;

  • Data Sparsity: Sparse datasets, where items have low occurrence frequencies, may necessitate lower support thresholds to uncover significant associations;

  • Domain Knowledge: Understanding the domain and the context of the dataset can guide the selection of appropriate thresholds. Prior knowledge about item interactions can inform the choice of support and confidence values;

  • Objective of Analysis: The purpose of association rule mining influences the choice of thresholds. For exploratory analysis, higher support thresholds may be suitable to identify prominent associations, while lower thresholds may be preferred for comprehensive pattern discovery.

Example

Now you can conduct a simple experiment: change the min_support and min_confidence values in the code sample below and observe how your changes influence the results.

123456789101112131415161718192021222324252627282930313233343536373839404142434445
import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori, association_rules # Create a sample transaction dataset transactions = [ ['milk', 'bread', 'eggs'], ['bread', 'butter', 'jam', 'eggs'], ['milk', 'bread', 'butter', 'jam'], ['milk', 'eggs', 'cheese'], ['bread', 'eggs', 'butter', 'jam', 'honey'], ['bread', 'eggs', 'jam', 'yogurt', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'cheese', 'yogurt'], ['milk', 'cheese', 'jam', 'honey', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'honey'] ] # Define minimum support and confidence thresholds min_support = 0.2 min_confidence = 0.7 # Initialize and fit `TransactionEncoder` encoder = TransactionEncoder() encoder.fit(transactions) # Transform transactions using the encoder one_hot_encoded = encoder.transform(transactions) # Convert one-hot encoded array to DataFrame df = pd.DataFrame(one_hot_encoded, columns=encoder.columns_) # Find frequent itemsets using the Apriori algorithm frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True) # Print frequent itemsets print("Frequent Itemsets:") print(frequent_itemsets) # Generate association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence) # Print association rules with antecedent -> consequent format and confidence print("\nAssociation Rules:") for index, row in rules.iterrows(): print("Rule: {} => {} with Confidence: {:.2f}".format(list(row['antecedents']), list(row['consequents']), row['confidence']))
copy
What is the minimum support threshold used for in association rule mining?

What is the minimum support threshold used for in association rule mining?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 7
We're sorry to hear that something went wrong. What happened?
some-alt