Course Content
Classification with Python
Classification with Python
Challenge: Comparing Models
Now you'll compare the models we've covered using a single dataset — the breast cancer dataset. The target variable is the 'diagnosis'
column, where 1
represents malignant and 0
represents benign cases.
You will apply GridSearchCV
to each model to find the best parameters. In this task, you'll use recall as the scoring metric because minimizing false negatives is crucial. To have GridSearchCV
select the best parameters based on recall, set scoring='recall'
.
Swipe to start coding
The task is to build all the models we learned and to print the best parameters along with the best recall score of each model. You will need to fill in the parameter names in the param_grid
dictionaries.
- For the k-NN model find the best
n_neighbors
value out of[3, 5, 7, 12]
. - For the Logistic Regression run through
[0.1, 1, 10]
values ofC
. - For a Decision Tree, we want to configure two parameters,
max_depth
andmin_samples_leaf
. Run through values[2, 4, 6, 10]
formax_depth
and[1, 2, 4, 7]
formin_samples_leaf
. - For a Random Forest, find the best
max_depth
(maximum depth of each Tree) value out of[2, 4, 6]
and the best number of trees(n_estimators
). Try values[20, 50, 100]
for the number of trees.
Solution
Note
The code takes some time to run(less than a minute).
Thanks for your feedback!
Challenge: Comparing Models
Now you'll compare the models we've covered using a single dataset — the breast cancer dataset. The target variable is the 'diagnosis'
column, where 1
represents malignant and 0
represents benign cases.
You will apply GridSearchCV
to each model to find the best parameters. In this task, you'll use recall as the scoring metric because minimizing false negatives is crucial. To have GridSearchCV
select the best parameters based on recall, set scoring='recall'
.
Swipe to start coding
The task is to build all the models we learned and to print the best parameters along with the best recall score of each model. You will need to fill in the parameter names in the param_grid
dictionaries.
- For the k-NN model find the best
n_neighbors
value out of[3, 5, 7, 12]
. - For the Logistic Regression run through
[0.1, 1, 10]
values ofC
. - For a Decision Tree, we want to configure two parameters,
max_depth
andmin_samples_leaf
. Run through values[2, 4, 6, 10]
formax_depth
and[1, 2, 4, 7]
formin_samples_leaf
. - For a Random Forest, find the best
max_depth
(maximum depth of each Tree) value out of[2, 4, 6]
and the best number of trees(n_estimators
). Try values[20, 50, 100]
for the number of trees.
Solution
Note
The code takes some time to run(less than a minute).
Thanks for your feedback!