Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Wrapper and Embedded Methods: RFE and SelectFromModel | Feature Selection Strategies
Feature Selection and Regularization Techniques

bookWrapper and Embedded Methods: RFE and SelectFromModel

Understanding feature selection is crucial to building robust and interpretable machine learning models. Two important categories of feature selection techniques are wrapper methods and embedded methods. Wrapper methods, such as Recursive Feature Elimination (RFE), use a predictive model to evaluate combinations of features and select the best subset based on model performance. In contrast, embedded methods incorporate feature selection as part of the model training process itself — SelectFromModel with Lasso regression is a common example. The main difference is that wrapper methods repeatedly train models on different subsets of features, while embedded methods select features based on the internal model attributes, such as coefficients or feature importances, as they are learned.

1234567891011121314151617181920212223242526272829303132333435
from sklearn.datasets import fetch_california_housing from sklearn.linear_model import LinearRegression, Lasso from sklearn.feature_selection import RFE, SelectFromModel from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd # --- Load dataset (NumPy for speed) --- data = fetch_california_housing() X = data.data # shape (20640, 8) y = data.target feature_names = np.array(data.feature_names) # --- RFE (fewer refits via step=2) --- lr = LinearRegression() rfe = RFE(estimator=lr, n_features_to_select=5, step=2) rfe.fit(X, y) rfe_features = feature_names[rfe.support_] # --- Lasso-based selection (scale first for faster convergence) --- scaler = StandardScaler() Xs = scaler.fit_transform(X) lasso = Lasso(alpha=0.1, random_state=42, max_iter=1000) # scaled => converges fast lasso.fit(Xs, y) sfm = SelectFromModel(lasso, prefit=True) lasso_features = feature_names[sfm.get_support()] # --- Compare --- overlap = set(rfe_features) & set(lasso_features) print("RFE selected features:", list(rfe_features)) print("SelectFromModel (Lasso) selected features:", list(lasso_features)) print("Overlap between RFE and Lasso-selected features:", list(overlap))
copy

Both wrapper and embedded methods have distinct advantages and limitations. Wrapper methods like RFE are often more flexible because they can work with any model and can optimize for the specific predictive task. However, they are computationally expensive, especially with large datasets or many features, since they require fitting the model multiple times. Embedded methods such as SelectFromModel with Lasso are typically faster and scale better because feature selection happens during model training. However, their effectiveness depends on the model's assumptions; for instance, Lasso may arbitrarily select one feature among several highly correlated ones, potentially missing important predictors. As you saw in the code, the features selected by RFE and SelectFromModel with Lasso can overlap, but may also differ due to these underlying mechanisms.

Note
Note

Multicollinearity — when two or more features are highly correlated — can impact feature selection. In such cases, methods like Lasso may select one correlated feature and ignore others, which can make interpretation tricky and sometimes lead to instability in the selected feature set.

question mark

Which statements accurately describe the differences between wrapper and embedded feature selection methods, including RFE and SelectFromModel with Lasso?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 2

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Awesome!

Completion rate improved to 8.33

bookWrapper and Embedded Methods: RFE and SelectFromModel

Pyyhkäise näyttääksesi valikon

Understanding feature selection is crucial to building robust and interpretable machine learning models. Two important categories of feature selection techniques are wrapper methods and embedded methods. Wrapper methods, such as Recursive Feature Elimination (RFE), use a predictive model to evaluate combinations of features and select the best subset based on model performance. In contrast, embedded methods incorporate feature selection as part of the model training process itself — SelectFromModel with Lasso regression is a common example. The main difference is that wrapper methods repeatedly train models on different subsets of features, while embedded methods select features based on the internal model attributes, such as coefficients or feature importances, as they are learned.

1234567891011121314151617181920212223242526272829303132333435
from sklearn.datasets import fetch_california_housing from sklearn.linear_model import LinearRegression, Lasso from sklearn.feature_selection import RFE, SelectFromModel from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd # --- Load dataset (NumPy for speed) --- data = fetch_california_housing() X = data.data # shape (20640, 8) y = data.target feature_names = np.array(data.feature_names) # --- RFE (fewer refits via step=2) --- lr = LinearRegression() rfe = RFE(estimator=lr, n_features_to_select=5, step=2) rfe.fit(X, y) rfe_features = feature_names[rfe.support_] # --- Lasso-based selection (scale first for faster convergence) --- scaler = StandardScaler() Xs = scaler.fit_transform(X) lasso = Lasso(alpha=0.1, random_state=42, max_iter=1000) # scaled => converges fast lasso.fit(Xs, y) sfm = SelectFromModel(lasso, prefit=True) lasso_features = feature_names[sfm.get_support()] # --- Compare --- overlap = set(rfe_features) & set(lasso_features) print("RFE selected features:", list(rfe_features)) print("SelectFromModel (Lasso) selected features:", list(lasso_features)) print("Overlap between RFE and Lasso-selected features:", list(overlap))
copy

Both wrapper and embedded methods have distinct advantages and limitations. Wrapper methods like RFE are often more flexible because they can work with any model and can optimize for the specific predictive task. However, they are computationally expensive, especially with large datasets or many features, since they require fitting the model multiple times. Embedded methods such as SelectFromModel with Lasso are typically faster and scale better because feature selection happens during model training. However, their effectiveness depends on the model's assumptions; for instance, Lasso may arbitrarily select one feature among several highly correlated ones, potentially missing important predictors. As you saw in the code, the features selected by RFE and SelectFromModel with Lasso can overlap, but may also differ due to these underlying mechanisms.

Note
Note

Multicollinearity — when two or more features are highly correlated — can impact feature selection. In such cases, methods like Lasso may select one correlated feature and ignore others, which can make interpretation tricky and sometimes lead to instability in the selected feature set.

question mark

Which statements accurately describe the differences between wrapper and embedded feature selection methods, including RFE and SelectFromModel with Lasso?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 2
some-alt