Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
How many Clusters? | Clustering Demystified
Clustering Demystified
course content

Course Content

Clustering Demystified

bookHow many Clusters?

You may be wondering: But hey, what is the exact number of clusters? We can use the so-called "elbow method".

The elbow method is a technique used to determine the optimal number of clusters in a k-means clustering algorithm. The method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. The "elbow" is the point of inflection on the curve where the explained variation begins to decrease at a slower rate. This point is considered the optimal number of clusters because adding more clusters will not significantly improve the explained variation.

Methods description

  • range(start, end): This generates a sequence of numbers from start (inclusive) to end (exclusive), representing the range of possible cluster numbers to be tested;
  • kmeans.inertia_: This attribute of the KMeans object retrieves the inertia value calculated for the current clustering configuration;
  • cs: This is an empty list that will store the "inertia" values calculated for each number of clusters. Inertia represents the sum of squared distances of samples to their closest cluster center;
  • plt.plot(): This function from the matplotlib library (matplotlib.pyplot) is used to create a line plot. It plots the number of clusters on the x-axis against the corresponding inertia values (CS) on the y-axis;
  • plt.title(), plt.xlabel(), plt.ylabel(): These functions set the title, x-axis label, and y-axis label of the plot, respectively;
  • plt.show(): This function displays the plot.

Task

  1. Evaluate the kmeans from 1 to 10.
  2. Plot the graph.

Mark tasks as Completed
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

You may be wondering: But hey, what is the exact number of clusters? We can use the so-called "elbow method".

The elbow method is a technique used to determine the optimal number of clusters in a k-means clustering algorithm. The method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. The "elbow" is the point of inflection on the curve where the explained variation begins to decrease at a slower rate. This point is considered the optimal number of clusters because adding more clusters will not significantly improve the explained variation.

Methods description

  • range(start, end): This generates a sequence of numbers from start (inclusive) to end (exclusive), representing the range of possible cluster numbers to be tested;
  • kmeans.inertia_: This attribute of the KMeans object retrieves the inertia value calculated for the current clustering configuration;
  • cs: This is an empty list that will store the "inertia" values calculated for each number of clusters. Inertia represents the sum of squared distances of samples to their closest cluster center;
  • plt.plot(): This function from the matplotlib library (matplotlib.pyplot) is used to create a line plot. It plots the number of clusters on the x-axis against the corresponding inertia values (CS) on the y-axis;
  • plt.title(), plt.xlabel(), plt.ylabel(): These functions set the title, x-axis label, and y-axis label of the plot, respectively;
  • plt.show(): This function displays the plot.

Task

  1. Evaluate the kmeans from 1 to 10.
  2. Plot the graph.

Mark tasks as Completed
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Section 1. Chapter 10
AVAILABLE TO ULTIMATE ONLY
some-alt