Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Wrapper and Embedded Methods: RFE and SelectFromModel | Feature Selection Strategies
Feature Selection and Regularization Techniques

bookWrapper and Embedded Methods: RFE and SelectFromModel

Understanding feature selection is crucial to building robust and interpretable machine learning models. Two important categories of feature selection techniques are wrapper methods and embedded methods. Wrapper methods, such as Recursive Feature Elimination (RFE), use a predictive model to evaluate combinations of features and select the best subset based on model performance. In contrast, embedded methods incorporate feature selection as part of the model training process itself — SelectFromModel with Lasso regression is a common example. The main difference is that wrapper methods repeatedly train models on different subsets of features, while embedded methods select features based on the internal model attributes, such as coefficients or feature importances, as they are learned.

1234567891011121314151617181920212223242526272829303132333435
from sklearn.datasets import fetch_california_housing from sklearn.linear_model import LinearRegression, Lasso from sklearn.feature_selection import RFE, SelectFromModel from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd # --- Load dataset (NumPy for speed) --- data = fetch_california_housing() X = data.data # shape (20640, 8) y = data.target feature_names = np.array(data.feature_names) # --- RFE (fewer refits via step=2) --- lr = LinearRegression() rfe = RFE(estimator=lr, n_features_to_select=5, step=2) rfe.fit(X, y) rfe_features = feature_names[rfe.support_] # --- Lasso-based selection (scale first for faster convergence) --- scaler = StandardScaler() Xs = scaler.fit_transform(X) lasso = Lasso(alpha=0.1, random_state=42, max_iter=1000) # scaled => converges fast lasso.fit(Xs, y) sfm = SelectFromModel(lasso, prefit=True) lasso_features = feature_names[sfm.get_support()] # --- Compare --- overlap = set(rfe_features) & set(lasso_features) print("RFE selected features:", list(rfe_features)) print("SelectFromModel (Lasso) selected features:", list(lasso_features)) print("Overlap between RFE and Lasso-selected features:", list(overlap))
copy

Both wrapper and embedded methods have distinct advantages and limitations. Wrapper methods like RFE are often more flexible because they can work with any model and can optimize for the specific predictive task. However, they are computationally expensive, especially with large datasets or many features, since they require fitting the model multiple times. Embedded methods such as SelectFromModel with Lasso are typically faster and scale better because feature selection happens during model training. However, their effectiveness depends on the model's assumptions; for instance, Lasso may arbitrarily select one feature among several highly correlated ones, potentially missing important predictors. As you saw in the code, the features selected by RFE and SelectFromModel with Lasso can overlap, but may also differ due to these underlying mechanisms.

Note
Note

Multicollinearity — when two or more features are highly correlated — can impact feature selection. In such cases, methods like Lasso may select one correlated feature and ignore others, which can make interpretation tricky and sometimes lead to instability in the selected feature set.

question mark

Which statements accurately describe the differences between wrapper and embedded feature selection methods, including RFE and SelectFromModel with Lasso?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 2

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain why the selected features might differ between RFE and Lasso?

What are some scenarios where I should prefer wrapper methods over embedded methods?

Can you provide more examples of embedded feature selection techniques?

Awesome!

Completion rate improved to 8.33

bookWrapper and Embedded Methods: RFE and SelectFromModel

Stryg for at vise menuen

Understanding feature selection is crucial to building robust and interpretable machine learning models. Two important categories of feature selection techniques are wrapper methods and embedded methods. Wrapper methods, such as Recursive Feature Elimination (RFE), use a predictive model to evaluate combinations of features and select the best subset based on model performance. In contrast, embedded methods incorporate feature selection as part of the model training process itself — SelectFromModel with Lasso regression is a common example. The main difference is that wrapper methods repeatedly train models on different subsets of features, while embedded methods select features based on the internal model attributes, such as coefficients or feature importances, as they are learned.

1234567891011121314151617181920212223242526272829303132333435
from sklearn.datasets import fetch_california_housing from sklearn.linear_model import LinearRegression, Lasso from sklearn.feature_selection import RFE, SelectFromModel from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd # --- Load dataset (NumPy for speed) --- data = fetch_california_housing() X = data.data # shape (20640, 8) y = data.target feature_names = np.array(data.feature_names) # --- RFE (fewer refits via step=2) --- lr = LinearRegression() rfe = RFE(estimator=lr, n_features_to_select=5, step=2) rfe.fit(X, y) rfe_features = feature_names[rfe.support_] # --- Lasso-based selection (scale first for faster convergence) --- scaler = StandardScaler() Xs = scaler.fit_transform(X) lasso = Lasso(alpha=0.1, random_state=42, max_iter=1000) # scaled => converges fast lasso.fit(Xs, y) sfm = SelectFromModel(lasso, prefit=True) lasso_features = feature_names[sfm.get_support()] # --- Compare --- overlap = set(rfe_features) & set(lasso_features) print("RFE selected features:", list(rfe_features)) print("SelectFromModel (Lasso) selected features:", list(lasso_features)) print("Overlap between RFE and Lasso-selected features:", list(overlap))
copy

Both wrapper and embedded methods have distinct advantages and limitations. Wrapper methods like RFE are often more flexible because they can work with any model and can optimize for the specific predictive task. However, they are computationally expensive, especially with large datasets or many features, since they require fitting the model multiple times. Embedded methods such as SelectFromModel with Lasso are typically faster and scale better because feature selection happens during model training. However, their effectiveness depends on the model's assumptions; for instance, Lasso may arbitrarily select one feature among several highly correlated ones, potentially missing important predictors. As you saw in the code, the features selected by RFE and SelectFromModel with Lasso can overlap, but may also differ due to these underlying mechanisms.

Note
Note

Multicollinearity — when two or more features are highly correlated — can impact feature selection. In such cases, methods like Lasso may select one correlated feature and ignore others, which can make interpretation tricky and sometimes lead to instability in the selected feature set.

question mark

Which statements accurately describe the differences between wrapper and embedded feature selection methods, including RFE and SelectFromModel with Lasso?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 2
some-alt