Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Undersampling Techniques | Sampling Techniques for Large Data
Large Data Handling

Undersampling Techniques

Swipe um das Menü anzuzeigen

When dealing with large datasets that are imbalanced, you often encounter situations where one class (the majority class) greatly outnumbers another (the minority class). This imbalance can make it difficult for models to learn meaningful patterns about the minority class, leading to poor predictive performance. Undersampling is a technique used to address this issue by reducing the number of samples in the majority class so that the dataset becomes more balanced.

You should consider undersampling when your dataset is too large for practical processing or when the majority class dominates to such an extent that the model ignores the minority class. However, undersampling is most appropriate when you have a very large dataset and can afford to lose some majority class samples without sacrificing important information. It is less suitable when the dataset is already small or the majority class contains rare but important examples.

123456789101112131415161718192021222324
import pandas as pd # Create a sample imbalanced dataset data = { "feature": range(20), "class": ["majority"] * 16 + ["minority"] * 4 } df = pd.DataFrame(data) # Count the number of samples in each class class_counts = df["class"].value_counts() minority_count = class_counts["minority"] # Randomly sample from the majority class to match the minority class count majority_sample = df[df["class"] == "majority"].sample(n=minority_count, random_state=42) minority_sample = df[df["class"] == "minority"] # Combine samples to get a balanced dataset balanced_df = pd.concat([majority_sample, minority_sample]) print("Original class distribution:") print(df["class"].value_counts()) print("\nBalanced class distribution after undersampling:") print(balanced_df["class"].value_counts())
question mark

What is a potential risk of undersampling?

Wählen Sie die richtige Antwort aus

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 2. Kapitel 4

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Abschnitt 2. Kapitel 4
some-alt