Ensemble Learning

Ensemble learning is a machine learning technique that combines predictions from multiple individual models to create a stronger, more accurate, and robust model. The basic idea behind ensemble methods is to leverage the wisdom of the crowd; by aggregating predictions from diverse models, the ensemble can often outperform any individual model.

Why Do We Need Ensemble Learning

Increased Accuracy: Combining predictions from multiple models often leads to higher accuracy than any single model.
Robustness: Ensembles are more resistant to overfitting because they average out biases and errors present in individual models.
Handling Complexity: Ensembles can capture complex relationships in data that might be difficult for individual models to grasp.
Versatility: Ensemble methods can be applied to various types of machine learning algorithms, making them versatile and widely applicable.

Example

Let's use the famous Breast Cancer dataset from scikit-learn for this ensemble classification example. Here's how you can perform ensemble learning using Random Forest on the Breast Cancer dataset and visualize the decision boundaries:


              12345678910111213141516171819202122232425262728293031323334353637383940
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data[:, :2], data.target  # We use the first two features for visualization

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the classifier
rf_classifier.fit(X_train, y_train)

# Make predictions
predictions = rf_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

# Visualization
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = rf_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', s=70, linewidth=1, cmap=plt.cm.Paired)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Random Forest Classifier Decision Boundaries (Breast Cancer Dataset)')
plt.show()

Var alt klart?

Tak for dine kommentarer!

Sektion 1. Kapitel 6

Spørg AI

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Kursusindhold

Foundations of Machine Learning Track Overview

What is Machine Learning?ML Introduction with scikit-learn Linear Regression with Python Classification with Python Cluster Analysis Ensemble Learning