Kursinnhold
Classification with Python
Classification with Python
Challenge: Implementing a Random Forest
In sklearn
, the classification version of Random Forest is implemented using the RandomForestClassifier
:
You will also calculate the cross-validation accuracy using the cross_val_score()
function:
In the end, you'll print the importance of each feature. The feature_importances_
attribute returns an array of importance scores — these scores represent how much each feature contributed to reducing Gini impurity across all the decision nodes where that feature was used. In other words, the more a feature helps split the data in a useful way, the higher its importance.
However, the attribute only gives the scores without feature names. To display both, you can pair them using Python’s zip()
function:
python
This prints each feature name along with its importance score, making it easier to understand which features the model relied on most.
Swipe to start coding
You are given a Titanic dataset stored as a DataFrame
in the df
variable.
- Initialize the Random Forest model, set
random_state=42
, train it, and store the fitted model in therandom_forest
variable. - Calculate the cross-validation scores for the trained model using
10
folds, and store the resulting scores in thecv_scores
variable.
Løsning
Takk for tilbakemeldingene dine!
Challenge: Implementing a Random Forest
In sklearn
, the classification version of Random Forest is implemented using the RandomForestClassifier
:
You will also calculate the cross-validation accuracy using the cross_val_score()
function:
In the end, you'll print the importance of each feature. The feature_importances_
attribute returns an array of importance scores — these scores represent how much each feature contributed to reducing Gini impurity across all the decision nodes where that feature was used. In other words, the more a feature helps split the data in a useful way, the higher its importance.
However, the attribute only gives the scores without feature names. To display both, you can pair them using Python’s zip()
function:
python
This prints each feature name along with its importance score, making it easier to understand which features the model relied on most.
Swipe to start coding
You are given a Titanic dataset stored as a DataFrame
in the df
variable.
- Initialize the Random Forest model, set
random_state=42
, train it, and store the fitted model in therandom_forest
variable. - Calculate the cross-validation scores for the trained model using
10
folds, and store the resulting scores in thecv_scores
variable.
Løsning
Takk for tilbakemeldingene dine!