Challenge: Implementing a Random Forest
In sklearn
, the classification version of Random Forest is implemented using the RandomForestClassifier
:
You will also calculate the cross-validation accuracy using the cross_val_score()
function:
In the end, you'll print the importance of each feature. The feature_importances_
attribute returns an array of importance scores — these scores represent how much each feature contributed to reducing Gini impurity across all the decision nodes where that feature was used. In other words, the more a feature helps split the data in a useful way, the higher its importance.
However, the attribute only gives the scores without feature names. To display both, you can pair them using Python’s zip()
function:
for feature, importance in zip(X.columns, model.feature_importances_):
print(feature, importance)
This prints each feature name along with its importance score, making it easier to understand which features the model relied on most.
Swipe to start coding
You are given a Titanic dataset stored as a DataFrame
in the df
variable.
- Initialize the Random Forest model, set
random_state=42
, train it, and store the fitted model in therandom_forest
variable. - Calculate the cross-validation scores for the trained model using
10
folds, and store the resulting scores in thecv_scores
variable.
Рішення
Дякуємо за ваш відгук!
single
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 4.17
Challenge: Implementing a Random Forest
Свайпніть щоб показати меню
In sklearn
, the classification version of Random Forest is implemented using the RandomForestClassifier
:
You will also calculate the cross-validation accuracy using the cross_val_score()
function:
In the end, you'll print the importance of each feature. The feature_importances_
attribute returns an array of importance scores — these scores represent how much each feature contributed to reducing Gini impurity across all the decision nodes where that feature was used. In other words, the more a feature helps split the data in a useful way, the higher its importance.
However, the attribute only gives the scores without feature names. To display both, you can pair them using Python’s zip()
function:
for feature, importance in zip(X.columns, model.feature_importances_):
print(feature, importance)
This prints each feature name along with its importance score, making it easier to understand which features the model relied on most.
Swipe to start coding
You are given a Titanic dataset stored as a DataFrame
in the df
variable.
- Initialize the Random Forest model, set
random_state=42
, train it, and store the fitted model in therandom_forest
variable. - Calculate the cross-validation scores for the trained model using
10
folds, and store the resulting scores in thecv_scores
variable.
Рішення
Дякуємо за ваш відгук!
Awesome!
Completion rate improved to 4.17single