 How to Choose Minimum Support/Confidence Values
How to Choose Minimum Support/Confidence Values
Choosing appropriate minimum support and confidence values is crucial when mining association rules from transactional datasets. The support and confidence thresholds determine the strength and relevance of the discovered rules. Selecting these values requires a balance between capturing meaningful associations and avoiding an overwhelming number of trivial rules.
Factors influencing minimum threshold selection
- 
Dataset Size: Large datasets may require lower support thresholds to capture meaningful associations due to the increased variability in item occurrences; 
- 
Data Sparsity: Sparse datasets, where items have low occurrence frequencies, may necessitate lower support thresholds to uncover significant associations; 
- 
Domain Knowledge: Understanding the domain and the context of the dataset can guide the selection of appropriate thresholds. Prior knowledge about item interactions can inform the choice of support and confidence values; 
- 
Objective of Analysis: The purpose of association rule mining influences the choice of thresholds. For exploratory analysis, higher support thresholds may be suitable to identify prominent associations, while lower thresholds may be preferred for comprehensive pattern discovery. 
Example
Now you can conduct a simple experiment: change the min_support and min_confidence values in the code sample below and observe how your changes influence the results.
123456789101112131415161718192021222324252627282930313233343536373839404142434445import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori, association_rules # Create a sample transaction dataset transactions = [ ['milk', 'bread', 'eggs'], ['bread', 'butter', 'jam', 'eggs'], ['milk', 'bread', 'butter', 'jam'], ['milk', 'eggs', 'cheese'], ['bread', 'eggs', 'butter', 'jam', 'honey'], ['bread', 'eggs', 'jam', 'yogurt', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'cheese', 'yogurt'], ['milk', 'cheese', 'jam', 'honey', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'honey'] ] # Define minimum support and confidence thresholds min_support = 0.2 min_confidence = 0.7 # Initialize and fit `TransactionEncoder` encoder = TransactionEncoder() encoder.fit(transactions) # Transform transactions using the encoder one_hot_encoded = encoder.transform(transactions) # Convert one-hot encoded array to DataFrame df = pd.DataFrame(one_hot_encoded, columns=encoder.columns_) # Find frequent itemsets using the Apriori algorithm frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True) # Print frequent itemsets print("Frequent Itemsets:") print(frequent_itemsets) # Generate association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence) # Print association rules with antecedent -> consequent format and confidence print("\nAssociation Rules:") for index, row in rules.iterrows(): print("Rule: {} => {} with Confidence: {:.2f}".format(list(row['antecedents']), list(row['consequents']), row['confidence']))
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Запитайте мені питання про цей предмет
Сумаризуйте цей розділ
Покажіть реальні приклади
Awesome!
Completion rate improved to 6.67 How to Choose Minimum Support/Confidence Values
How to Choose Minimum Support/Confidence Values
Свайпніть щоб показати меню
Choosing appropriate minimum support and confidence values is crucial when mining association rules from transactional datasets. The support and confidence thresholds determine the strength and relevance of the discovered rules. Selecting these values requires a balance between capturing meaningful associations and avoiding an overwhelming number of trivial rules.
Factors influencing minimum threshold selection
- 
Dataset Size: Large datasets may require lower support thresholds to capture meaningful associations due to the increased variability in item occurrences; 
- 
Data Sparsity: Sparse datasets, where items have low occurrence frequencies, may necessitate lower support thresholds to uncover significant associations; 
- 
Domain Knowledge: Understanding the domain and the context of the dataset can guide the selection of appropriate thresholds. Prior knowledge about item interactions can inform the choice of support and confidence values; 
- 
Objective of Analysis: The purpose of association rule mining influences the choice of thresholds. For exploratory analysis, higher support thresholds may be suitable to identify prominent associations, while lower thresholds may be preferred for comprehensive pattern discovery. 
Example
Now you can conduct a simple experiment: change the min_support and min_confidence values in the code sample below and observe how your changes influence the results.
123456789101112131415161718192021222324252627282930313233343536373839404142434445import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori, association_rules # Create a sample transaction dataset transactions = [ ['milk', 'bread', 'eggs'], ['bread', 'butter', 'jam', 'eggs'], ['milk', 'bread', 'butter', 'jam'], ['milk', 'eggs', 'cheese'], ['bread', 'eggs', 'butter', 'jam', 'honey'], ['bread', 'eggs', 'jam', 'yogurt', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'cheese', 'yogurt'], ['milk', 'cheese', 'jam', 'honey', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'honey'] ] # Define minimum support and confidence thresholds min_support = 0.2 min_confidence = 0.7 # Initialize and fit `TransactionEncoder` encoder = TransactionEncoder() encoder.fit(transactions) # Transform transactions using the encoder one_hot_encoded = encoder.transform(transactions) # Convert one-hot encoded array to DataFrame df = pd.DataFrame(one_hot_encoded, columns=encoder.columns_) # Find frequent itemsets using the Apriori algorithm frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True) # Print frequent itemsets print("Frequent Itemsets:") print(frequent_itemsets) # Generate association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence) # Print association rules with antecedent -> consequent format and confidence print("\nAssociation Rules:") for index, row in rules.iterrows(): print("Rule: {} => {} with Confidence: {:.2f}".format(list(row['antecedents']), list(row['consequents']), row['confidence']))
Дякуємо за ваш відгук!