Swipe to show menu

What is a K-Means Algorithm?

That was an informative scatter plot you built in the last chapter, wasn't it? I think you saw three clear groups, we'll call them clusters. How can we use ML algorithms to answer this question?

The first method we will consider is K-Means. How does it work? At first, you need to set the number of clusters you would like to explore. Let this number be N. Then, the algorithm chooses N random points (not necessary data points), and assigns points to certain clusters by the minimum distance to the randomly chosen point. Then, the mean characteristics are evaluated within each cluster, and the previous steps repeat until all the points are left in the same clusters after several iterations, and the variance between the points within each cluster is minimized.

We will use KMeans function from sklearn.cluster class. To implement the algorithm you should follow the next steps:

Create a KMeans model assigned to a certain variable.
Compute K-Means clustering using the .fit() method of the KMeans object with the data set as a parameter.
Predict the labels using the fitted model by applying the .predict() function to the KMeans object with the data set as a parameter.
(not necessary) Visualize the result of clustering.

For example, imagine that we have the 2-D data, with the respective scatter plot below.


              123456789101112131415161718192021222324
            
# Import the libraries
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns

# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/train_data1.csv')

# Create model
model = KMeans(n_clusters = 2)

# Fit the data to model
model.fit(data)

# Predict the labels for data using model
prediction = model.predict(data)

# Add new column to DataFrame
data['prediction'] = prediction

# Visualize the result
sns.scatterplot(x = 'x', y = 'y', hue = 'prediction', data = data)
plt.show()

The result for the script above is below.

Note, how we add a new column to the data DataFrame for easier seaborn usage. Now it's your turn! Try to complete the following task following the same steps.

Task

Swipe to start coding

Import KMeans function from sklearn.cluster.
Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
Compute K-Means clustering for data using the .fit() method of model.
Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
Add a new column 'prediction' with values of the prediction variable (created in the previous step).
Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Solution

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 2

single

Ask AI

Ask anything or try one of the suggested questions to begin our chat

What is a K-Means Algorithm?

That was an informative scatter plot you built in the last chapter, wasn't it? I think you saw three clear groups, we'll call them clusters. How can we use ML algorithms to answer this question?

We will use KMeans function from sklearn.cluster class. To implement the algorithm you should follow the next steps:

Create a KMeans model assigned to a certain variable.
Compute K-Means clustering using the .fit() method of the KMeans object with the data set as a parameter.
Predict the labels using the fitted model by applying the .predict() function to the KMeans object with the data set as a parameter.
(not necessary) Visualize the result of clustering.

For example, imagine that we have the 2-D data, with the respective scatter plot below.


              123456789101112131415161718192021222324
            
# Import the libraries
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns

# Read the data
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/train_data1.csv')

# Create model
model = KMeans(n_clusters = 2)

# Fit the data to model
model.fit(data)

# Predict the labels for data using model
prediction = model.predict(data)

# Add new column to DataFrame
data['prediction'] = prediction

# Visualize the result
sns.scatterplot(x = 'x', y = 'y', hue = 'prediction', data = data)
plt.show()

The result for the script above is below.

Note, how we add a new column to the data DataFrame for easier seaborn usage. Now it's your turn! Try to complete the following task following the same steps.

Task

Swipe to start coding

Import KMeans function from sklearn.cluster.
Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
Compute K-Means clustering for data using the .fit() method of model.
Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
Add a new column 'prediction' with values of the prediction variable (created in the previous step).
Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Solution

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

Swipe to show menu

What is a K-Means Algorithm?

Solution

Awesome!

What is a K-Means Algorithm?

Solution

Awesome!