Зміст курсу
Classification with Python
Classification with Python
Challenge: Choosing the Best K Value
As demonstrated in the previous chapters, the model makes different predictions based on the k (number of neighbors) values. When building a model, it's crucial to select the k value that yields the best performance.
You can use cross-validation to measure the model's performance. Running a loop and calculating cross-validation scores for a range of k values to select the highest is a straightforward approach. This is the most commonly used method. sklearn
provides a convenient GridSearchCV
class for this task:
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
python
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Swipe to start coding
You are given the Star Wars ratings dataset stored as a DataFrame
in the df
variable.
- Initialize
param_grid
as a dictionary containing then_neighbors
parameter with the values[3, 9, 18, 27]
. - Create a
GridSearchCV
object usingparam_grid
with 4-fold cross-validation, train it, and store it in thegrid_search
variable. - Retrieve the best model from
grid_search
and store it in thebest_model
variable. - Retrieve the score of the best model and store it in the
best_score
variable.
Рішення
Дякуємо за ваш відгук!
Challenge: Choosing the Best K Value
As demonstrated in the previous chapters, the model makes different predictions based on the k (number of neighbors) values. When building a model, it's crucial to select the k value that yields the best performance.
You can use cross-validation to measure the model's performance. Running a loop and calculating cross-validation scores for a range of k values to select the highest is a straightforward approach. This is the most commonly used method. sklearn
provides a convenient GridSearchCV
class for this task:
The param_grid
parameter takes a dictionary with parameter names as keys and a list of items to go through as a list. For example, to try values 1-99 for n_neighbors
, you would use:
python
The .fit(X, y)
method leads the GridSearchCV
object to find the best parameters from param_grid
and re-train the model with the best parameters using the whole set.
You can then get the highest score using the .best_score_
attribute and predict new values using the .predict()
method.
Swipe to start coding
You are given the Star Wars ratings dataset stored as a DataFrame
in the df
variable.
- Initialize
param_grid
as a dictionary containing then_neighbors
parameter with the values[3, 9, 18, 27]
. - Create a
GridSearchCV
object usingparam_grid
with 4-fold cross-validation, train it, and store it in thegrid_search
variable. - Retrieve the best model from
grid_search
and store it in thebest_model
variable. - Retrieve the score of the best model and store it in the
best_score
variable.
Рішення
Дякуємо за ваш відгук!