Conteúdo do Curso
Ensemble Learning
Ensemble Learning
Bagging Regressor
Bagging Regressor creates an ensemble of multiple base regression models and combines their predictions to produce a final prediction. In Bagging Regressor, the base model is typically a regression algorithm, such as Decision Tree Regressor. The main idea behind Bagging Regressor is to reduce overfitting and improve the stability and accuracy of the predictions by averaging the predictions of multiple base models.
How does Bagging Regressor work?
- Bootstrap Sampling: The Bagging Regressor generates multiple subsets of the training data by randomly selecting samples with replacements. Each subset is called a bootstrap sample;
- Base Model Training: A separate base regression model (e.g., Decision Tree Regressor) is trained on each bootstrap sample. This creates multiple base models, each with its own variation due to the different subsets of data they were trained on;
- Aggregation of Predictions: The Bagging Regressor aggregates the predictions from all base models to make predictions. In the case of regression tasks, the predictions are typically averaged across the base models to form the final prediction. This ensemble approach helps to reduce overfitting and improve the overall model performance.
Note
Selecting samples with replacement is a concept often used in statistics and probability. It refers to a method of sampling data points or elements from a dataset or population where, after each selection, the selected item is put back into the dataset before the next selection. In other words, the same item can be chosen more than once in the sampling process.
Example of usage
The principle of using a Bagging Regressor in Python is the same as using a Bagging Classifier. The only difference is that Bagging Regressor has no implementation for the .predict_proba()
method - the .predict()
method is used to create predictions instead.
Note
If we don't specify the base model of
AdaBoostRegressor
, theDecisionTreeRegressor
will be used by default.
import numpy as np import pandas as pd from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.ensemble import BaggingRegressor from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import mean_squared_error # Load the California Housing dataset data = fetch_california_housing() X, y = data.data, data.target # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a base model (Decision Tree Regressor) base_model = DecisionTreeRegressor(random_state=42) # Create the Bagging Regressor bagging_model = BaggingRegressor(base_model, n_estimators=10) # Train the Bagging Regressor bagging_model.fit(X_train, y_train) # Make predictions on the test data predictions = bagging_model.predict(X_test) # Calculate mean squared error (MSE) mse = mean_squared_error(y_test, predictions) print(f'Mean Squared Error: {mse:.4f}')
Obrigado pelo seu feedback!