Course Content
Classification with Python
Classification with Python
Challenge: Comparing Models
Now we will compare the models we learned on one dataset. This is a breast cancer dataset. The target is the 'diagnosis'
column (1 – malignant, 0 – benign).
We will apply GridSearchCV
to each model to find the best parameters. Also, in this task, we would use the recall metric for scoring since we do not want to have False Negatives. GridSearchCV
can choose the parameters based on the recall metric if you set scoring='recall'
.
Swipe to show code editor
The task is to build all the models we learned and to print the best parameters along with the best recall score of each model. You will need to fill in the parameter names in the param_grid
dictionaries.
- For the k-NN model find the best
n_neighbors
value out of[3, 5, 7, 12]
. - For the Logistic Regression run through
[0.1, 1, 10]
values ofC
. - For a Decision Tree, we want to configure two parameters,
max_depth
andmin_samples_leaf
. Run through values[2, 4, 6, 10]
formax_depth
and[1, 2, 4, 7]
formin_samples_leaf
. - For a Random Forest, find the best
max_depth
(maximum depth of each Tree) value out of[2, 4, 6]
and the best number of trees(n_estimators
). Try values[20, 50, 100]
for the number of trees.
Note
The code takes some time to run(less than a minute).
Thanks for your feedback!
Challenge: Comparing Models
Now we will compare the models we learned on one dataset. This is a breast cancer dataset. The target is the 'diagnosis'
column (1 – malignant, 0 – benign).
We will apply GridSearchCV
to each model to find the best parameters. Also, in this task, we would use the recall metric for scoring since we do not want to have False Negatives. GridSearchCV
can choose the parameters based on the recall metric if you set scoring='recall'
.
Swipe to show code editor
The task is to build all the models we learned and to print the best parameters along with the best recall score of each model. You will need to fill in the parameter names in the param_grid
dictionaries.
- For the k-NN model find the best
n_neighbors
value out of[3, 5, 7, 12]
. - For the Logistic Regression run through
[0.1, 1, 10]
values ofC
. - For a Decision Tree, we want to configure two parameters,
max_depth
andmin_samples_leaf
. Run through values[2, 4, 6, 10]
formax_depth
and[1, 2, 4, 7]
formin_samples_leaf
. - For a Random Forest, find the best
max_depth
(maximum depth of each Tree) value out of[2, 4, 6]
and the best number of trees(n_estimators
). Try values[20, 50, 100]
for the number of trees.
Note
The code takes some time to run(less than a minute).
Thanks for your feedback!