Course Content
Classification with Python
Classification with Python
Challenge: Choosing the Best K Value.
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn
has a neat class for that task.
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Task
- Import the
GridSearchCV
class. - Scale the
X
usingStandardScaler
. - Look for the best value of
n_neighbors
among[3, 9, 18, 27]
. - Initialize and train a
GridSearchCV
object with 4 folds of cross-validation. - Print the score of the best model.
Thanks for your feedback!
Challenge: Choosing the Best K Value.
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn
has a neat class for that task.
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Task
- Import the
GridSearchCV
class. - Scale the
X
usingStandardScaler
. - Look for the best value of
n_neighbors
among[3, 9, 18, 27]
. - Initialize and train a
GridSearchCV
object with 4 folds of cross-validation. - Print the score of the best model.
Thanks for your feedback!
Challenge: Choosing the Best K Value.
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn
has a neat class for that task.
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Task
- Import the
GridSearchCV
class. - Scale the
X
usingStandardScaler
. - Look for the best value of
n_neighbors
among[3, 9, 18, 27]
. - Initialize and train a
GridSearchCV
object with 4 folds of cross-validation. - Print the score of the best model.
Thanks for your feedback!
As shown in the previous chapters, the model makes different predictions for different k(neighbors number) values.
When we build a model, we want to choose the k that will lead to the best performance. And in the previous chapter, we learned how to measure performance using cross-validation.
Running a loop and calculating cross-validation scores for some range of k values to choose the highest sounds like a no-brainer. And that's the most frequently used approach. sklearn
has a neat class for that task.
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Task
- Import the
GridSearchCV
class. - Scale the
X
usingStandardScaler
. - Look for the best value of
n_neighbors
among[3, 9, 18, 27]
. - Initialize and train a
GridSearchCV
object with 4 folds of cross-validation. - Print the score of the best model.