Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Implementing k-NN | k-NN Classifier
Classification with Python

bookImplementing k-NN

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Once you imported the class and created a class object like this:

# Importing the class
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

knn.fit(X_scaled, y)

And that's it! You can predict new values now.

y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. StandardScaler is commonly used for this purpose:

You should calculate xˉ\bar x (mean) and ss (standard deviation) on the training set using either .fit() or .fit_transform() method. This step ensures that the scaling parameters are derived from the training data.

When you have test set to predict, you must use the same xˉ\bar x and ss to preprocess this data using .transform(). This consistency is crucial because it ensures that the test data is scaled in the same way as the training data, maintaining the integrity of the model's predictions.

# Importing the class
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
# Calculating xΜ„ and s and scaling `X_train`
X_train_scaled = scaler.fit_transform(X_train)
# Scaling `X_test` with xΜ„ and s calculated in the previous line
X_test_scaled = scaler.transform(X_test)

If you use different xˉ\bar x and ss for training set and test set, your predictions will likely be worse.

Example

Let's explore a straightforward example where we aim to predict whether a person will enjoy Star Wars VI based on their ratings for Star Wars IV and V. The data is taken from The Movies Dataset with extra preprocessing. A person is considered to like Star Wars VI if they rate it more than 4 (out of 5).

After training our model, we'll make predictions for two individuals from the test set. The first individual rates Star Wars IV and V as 5 and 5, respectively, while the second individual rates them as 4.5 and 4.

123456789101112131415161718192021222324252627
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') # Dropping the target column and leaving only features as `X_train` X_train = df.drop('StarWars6', axis=1) # Storing target column as `y_train`, which contains 1 (liked SW 6) or 0 (didn't like SW 6) y_train = df['StarWars6'] # Test set of two people X_test = np.array([[5, 5], [4.5, 4]]) # Scaling the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Building a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_train, y_train) y_pred = knn.predict(X_test) print(y_pred)
copy
question mark

Which of the following class names from scikit-learn are used to implement the k-Nearest Neighbors classifier and to scale features when preparing data for k-NN?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4.17

bookImplementing k-NN

Swipe to show menu

KNeighborsClassifier

Implementing k-Nearest Neighbors is pretty straightforward. We only need to import and use the KNeighborsClassifier class.

Once you imported the class and created a class object like this:

# Importing the class
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

You need to feed it the training data using the .fit() method:

knn.fit(X_scaled, y)

And that's it! You can predict new values now.

y_pred = knn.predict(X_new_scaled)

Scaling the data

However, remember that the data must be scaled. StandardScaler is commonly used for this purpose:

You should calculate xˉ\bar x (mean) and ss (standard deviation) on the training set using either .fit() or .fit_transform() method. This step ensures that the scaling parameters are derived from the training data.

When you have test set to predict, you must use the same xˉ\bar x and ss to preprocess this data using .transform(). This consistency is crucial because it ensures that the test data is scaled in the same way as the training data, maintaining the integrity of the model's predictions.

# Importing the class
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
# Calculating xΜ„ and s and scaling `X_train`
X_train_scaled = scaler.fit_transform(X_train)
# Scaling `X_test` with xΜ„ and s calculated in the previous line
X_test_scaled = scaler.transform(X_test)

If you use different xˉ\bar x and ss for training set and test set, your predictions will likely be worse.

Example

Let's explore a straightforward example where we aim to predict whether a person will enjoy Star Wars VI based on their ratings for Star Wars IV and V. The data is taken from The Movies Dataset with extra preprocessing. A person is considered to like Star Wars VI if they rate it more than 4 (out of 5).

After training our model, we'll make predictions for two individuals from the test set. The first individual rates Star Wars IV and V as 5 and 5, respectively, while the second individual rates them as 4.5 and 4.

123456789101112131415161718192021222324252627
from sklearn.neighbors import KNeighborsClassifier from sklearn.preprocessing import StandardScaler import numpy as np import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/starwars_binary.csv') # Dropping the target column and leaving only features as `X_train` X_train = df.drop('StarWars6', axis=1) # Storing target column as `y_train`, which contains 1 (liked SW 6) or 0 (didn't like SW 6) y_train = df['StarWars6'] # Test set of two people X_test = np.array([[5, 5], [4.5, 4]]) # Scaling the data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Building a model and predict new instances knn = KNeighborsClassifier(n_neighbors=13).fit(X_train, y_train) y_pred = knn.predict(X_test) print(y_pred)
copy
question mark

Which of the following class names from scikit-learn are used to implement the k-Nearest Neighbors classifier and to scale features when preparing data for k-NN?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 4
some-alt