Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
How many Clusters? | Clustering Demystified
Clustering Demystified
course content

Зміст курсу

Clustering Demystified

bookHow many Clusters?

You may be wondering: But hey, what is the exact number of clusters? We can use the so-called "elbow method".

The elbow method is a technique used to determine the optimal number of clusters in a k-means clustering algorithm. The method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. The "elbow" is the point of inflection on the curve where the explained variation begins to decrease at a slower rate. This point is considered the optimal number of clusters because adding more clusters will not significantly improve the explained variation.

Methods description

  • range(start, end): This generates a sequence of numbers from start (inclusive) to end (exclusive), representing the range of possible cluster numbers to be tested;
  • kmeans.inertia_: This attribute of the KMeans object retrieves the inertia value calculated for the current clustering configuration;
  • cs: This is an empty list that will store the "inertia" values calculated for each number of clusters. Inertia represents the sum of squared distances of samples to their closest cluster center;
  • plt.plot(): This function from the matplotlib library (matplotlib.pyplot) is used to create a line plot. It plots the number of clusters on the x-axis against the corresponding inertia values (CS) on the y-axis;
  • plt.title(), plt.xlabel(), plt.ylabel(): These functions set the title, x-axis label, and y-axis label of the plot, respectively;
  • plt.show(): This function displays the plot.

Завдання

  1. Evaluate the kmeans from 1 to 10.
  2. Plot the graph.

Mark tasks as Completed
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

You may be wondering: But hey, what is the exact number of clusters? We can use the so-called "elbow method".

The elbow method is a technique used to determine the optimal number of clusters in a k-means clustering algorithm. The method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. The "elbow" is the point of inflection on the curve where the explained variation begins to decrease at a slower rate. This point is considered the optimal number of clusters because adding more clusters will not significantly improve the explained variation.

Methods description

  • range(start, end): This generates a sequence of numbers from start (inclusive) to end (exclusive), representing the range of possible cluster numbers to be tested;
  • kmeans.inertia_: This attribute of the KMeans object retrieves the inertia value calculated for the current clustering configuration;
  • cs: This is an empty list that will store the "inertia" values calculated for each number of clusters. Inertia represents the sum of squared distances of samples to their closest cluster center;
  • plt.plot(): This function from the matplotlib library (matplotlib.pyplot) is used to create a line plot. It plots the number of clusters on the x-axis against the corresponding inertia values (CS) on the y-axis;
  • plt.title(), plt.xlabel(), plt.ylabel(): These functions set the title, x-axis label, and y-axis label of the plot, respectively;
  • plt.show(): This function displays the plot.

Завдання

  1. Evaluate the kmeans from 1 to 10.
  2. Plot the graph.

Mark tasks as Completed
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 1. Розділ 10
AVAILABLE TO ULTIMATE ONLY
some-alt