Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What is a K-Means Algorithm? | K-Means Algorithm
Cluster Analysis in Python
course content

Course Content

Cluster Analysis in Python

Cluster Analysis in Python

1. K-Means Algorithm
2. K-Medoids Algorithm
3. Hierarchical Clustering
4. Spectral Clustering

What is a K-Means Algorithm?

That was an informative scatter plot you built in the last chapter, wasn't it? I think you saw three clear groups, we'll call them clusters. How can we use ML algorithms to answer this question?

The first method we will consider is K-Means. How does it work? At first, you need to set the number of clusters you would like to explore. Let this number be N. Then, the algorithm chooses N random points (not necessary data points), and assigns points to certain clusters by the minimum distance to the randomly chosen point. Then, the mean characteristics are evaluated within each cluster, and the previous steps repeat until all the points are left in the same clusters after several iterations, and the variance between the points within each cluster is minimized.

We will use KMeans function from sklearn.cluster class. To implement the algorithm you should follow the next steps:

  1. Create a KMeans model assigned to a certain variable.
  2. Compute K-Means clustering using the .fit() method of the KMeans object with the data set as a parameter.
  3. Predict the labels using the fitted model by applying the .predict() function to the KMeans object with the data set as a parameter.
  4. (not necessary) Visualize the result of clustering.

For example, imagine that we have the 2-D data, with the respective scatter plot below.

123456789101112131415161718192021222324
# Import the libraries import pandas as pd from sklearn.cluster import KMeans import matplotlib.pyplot as plt import seaborn as sns # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/train_data1.csv') # Create model model = KMeans(n_clusters = 2) # Fit the data to model model.fit(data) # Predict the labels for data using model prediction = model.predict(data) # Add new column to DataFrame data['prediction'] = prediction # Visualize the result sns.scatterplot(x = 'x', y = 'y', hue = 'prediction', data = data) plt.show()
copy

The result for the script above is below.

Note, how we add a new column to the data DataFrame for easier seaborn usage. Now it's your turn! Try to complete the following task following the same steps.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 2
toggle bottom row

What is a K-Means Algorithm?

That was an informative scatter plot you built in the last chapter, wasn't it? I think you saw three clear groups, we'll call them clusters. How can we use ML algorithms to answer this question?

The first method we will consider is K-Means. How does it work? At first, you need to set the number of clusters you would like to explore. Let this number be N. Then, the algorithm chooses N random points (not necessary data points), and assigns points to certain clusters by the minimum distance to the randomly chosen point. Then, the mean characteristics are evaluated within each cluster, and the previous steps repeat until all the points are left in the same clusters after several iterations, and the variance between the points within each cluster is minimized.

We will use KMeans function from sklearn.cluster class. To implement the algorithm you should follow the next steps:

  1. Create a KMeans model assigned to a certain variable.
  2. Compute K-Means clustering using the .fit() method of the KMeans object with the data set as a parameter.
  3. Predict the labels using the fitted model by applying the .predict() function to the KMeans object with the data set as a parameter.
  4. (not necessary) Visualize the result of clustering.

For example, imagine that we have the 2-D data, with the respective scatter plot below.

123456789101112131415161718192021222324
# Import the libraries import pandas as pd from sklearn.cluster import KMeans import matplotlib.pyplot as plt import seaborn as sns # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/train_data1.csv') # Create model model = KMeans(n_clusters = 2) # Fit the data to model model.fit(data) # Predict the labels for data using model prediction = model.predict(data) # Add new column to DataFrame data['prediction'] = prediction # Visualize the result sns.scatterplot(x = 'x', y = 'y', hue = 'prediction', data = data) plt.show()
copy

The result for the script above is below.

Note, how we add a new column to the data DataFrame for easier seaborn usage. Now it's your turn! Try to complete the following task following the same steps.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 2
toggle bottom row

What is a K-Means Algorithm?

That was an informative scatter plot you built in the last chapter, wasn't it? I think you saw three clear groups, we'll call them clusters. How can we use ML algorithms to answer this question?

The first method we will consider is K-Means. How does it work? At first, you need to set the number of clusters you would like to explore. Let this number be N. Then, the algorithm chooses N random points (not necessary data points), and assigns points to certain clusters by the minimum distance to the randomly chosen point. Then, the mean characteristics are evaluated within each cluster, and the previous steps repeat until all the points are left in the same clusters after several iterations, and the variance between the points within each cluster is minimized.

We will use KMeans function from sklearn.cluster class. To implement the algorithm you should follow the next steps:

  1. Create a KMeans model assigned to a certain variable.
  2. Compute K-Means clustering using the .fit() method of the KMeans object with the data set as a parameter.
  3. Predict the labels using the fitted model by applying the .predict() function to the KMeans object with the data set as a parameter.
  4. (not necessary) Visualize the result of clustering.

For example, imagine that we have the 2-D data, with the respective scatter plot below.

123456789101112131415161718192021222324
# Import the libraries import pandas as pd from sklearn.cluster import KMeans import matplotlib.pyplot as plt import seaborn as sns # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/train_data1.csv') # Create model model = KMeans(n_clusters = 2) # Fit the data to model model.fit(data) # Predict the labels for data using model prediction = model.predict(data) # Add new column to DataFrame data['prediction'] = prediction # Visualize the result sns.scatterplot(x = 'x', y = 'y', hue = 'prediction', data = data) plt.show()
copy

The result for the script above is below.

Note, how we add a new column to the data DataFrame for easier seaborn usage. Now it's your turn! Try to complete the following task following the same steps.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

That was an informative scatter plot you built in the last chapter, wasn't it? I think you saw three clear groups, we'll call them clusters. How can we use ML algorithms to answer this question?

The first method we will consider is K-Means. How does it work? At first, you need to set the number of clusters you would like to explore. Let this number be N. Then, the algorithm chooses N random points (not necessary data points), and assigns points to certain clusters by the minimum distance to the randomly chosen point. Then, the mean characteristics are evaluated within each cluster, and the previous steps repeat until all the points are left in the same clusters after several iterations, and the variance between the points within each cluster is minimized.

We will use KMeans function from sklearn.cluster class. To implement the algorithm you should follow the next steps:

  1. Create a KMeans model assigned to a certain variable.
  2. Compute K-Means clustering using the .fit() method of the KMeans object with the data set as a parameter.
  3. Predict the labels using the fitted model by applying the .predict() function to the KMeans object with the data set as a parameter.
  4. (not necessary) Visualize the result of clustering.

For example, imagine that we have the 2-D data, with the respective scatter plot below.

123456789101112131415161718192021222324
# Import the libraries import pandas as pd from sklearn.cluster import KMeans import matplotlib.pyplot as plt import seaborn as sns # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/train_data1.csv') # Create model model = KMeans(n_clusters = 2) # Fit the data to model model.fit(data) # Predict the labels for data using model prediction = model.predict(data) # Add new column to DataFrame data['prediction'] = prediction # Visualize the result sns.scatterplot(x = 'x', y = 'y', hue = 'prediction', data = data) plt.show()
copy

The result for the script above is below.

Note, how we add a new column to the data DataFrame for easier seaborn usage. Now it's your turn! Try to complete the following task following the same steps.

Task

  1. Import KMeans function from sklearn.cluster.
  2. Create a KMeans model with the n_clusters parameter set to 3. Assign to model.
  3. Compute K-Means clustering for data using the .fit() method of model.
  4. Predict the labels for data using the .predict() method of model. Assign the result to the prediction variable.
  5. Add a new column 'prediction' with values of the prediction variable (created in the previous step).
  6. Visualize the results. Build scatter plot using seaborn library, passing 'x' column as x parameter, 'y' column as y parameter, and 'prediction' column as hue parameter. Do not forget to apply .show() to plt.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 1. Chapter 2
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt