Course Content
Foundations of Machine Learning Track Overview
Foundations of Machine Learning Track Overview
Ensemble Learning
Ensemble learning is a machine learning technique that combines predictions from multiple individual models to create a stronger, more accurate, and robust model. The basic idea behind ensemble methods is to leverage the wisdom of the crowd; by aggregating predictions from diverse models, the ensemble can often outperform any individual model.
Why Do We Need Ensemble Learning
- Increased Accuracy: Combining predictions from multiple models often leads to higher accuracy than any single model.
- Robustness: Ensembles are more resistant to overfitting because they average out biases and errors present in individual models.
- Handling Complexity: Ensembles can capture complex relationships in data that might be difficult for individual models to grasp.
- Versatility: Ensemble methods can be applied to various types of machine learning algorithms, making them versatile and widely applicable.
Example
Let's use the famous Breast Cancer dataset from scikit-learn
for this ensemble classification example. Here's how you can perform ensemble learning using Random Forest on the Breast Cancer dataset and visualize the decision boundaries:
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load the Breast Cancer dataset data = load_breast_cancer() X, y = data.data[:, :2], data.target # We use the first two features for visualization # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create Random Forest Classifier rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42) # Train the classifier rf_classifier.fit(X_train, y_train) # Make predictions predictions = rf_classifier.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, predictions) print("Accuracy:", accuracy) # Visualization x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1)) Z = rf_classifier.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.8) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', s=70, linewidth=1, cmap=plt.cm.Paired) plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.title('Random Forest Classifier Decision Boundaries (Breast Cancer Dataset)') plt.show()
Thanks for your feedback!