Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
How many Clusters? | Clustering Demystified
Clustering Demystified
course content

Conteúdo do Curso

Clustering Demystified

bookHow many Clusters?

You may be wondering: But hey, what is the exact number of clusters? We can use the so-called "elbow method".

The elbow method is a technique used to determine the optimal number of clusters in a k-means clustering algorithm. The method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. The "elbow" is the point of inflection on the curve where the explained variation begins to decrease at a slower rate. This point is considered the optimal number of clusters because adding more clusters will not significantly improve the explained variation.

Methods description

  • range(start, end): This generates a sequence of numbers from start (inclusive) to end (exclusive), representing the range of possible cluster numbers to be tested;
  • kmeans.inertia_: This attribute of the KMeans object retrieves the inertia value calculated for the current clustering configuration;
  • cs: This is an empty list that will store the "inertia" values calculated for each number of clusters. Inertia represents the sum of squared distances of samples to their closest cluster center;
  • plt.plot(): This function from the matplotlib library (matplotlib.pyplot) is used to create a line plot. It plots the number of clusters on the x-axis against the corresponding inertia values (CS) on the y-axis;
  • plt.title(), plt.xlabel(), plt.ylabel(): These functions set the title, x-axis label, and y-axis label of the plot, respectively;
  • plt.show(): This function displays the plot.

Tarefa

  1. Evaluate the kmeans from 1 to 10.
  2. Plot the graph.

Mark tasks as Completed
Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

You may be wondering: But hey, what is the exact number of clusters? We can use the so-called "elbow method".

The elbow method is a technique used to determine the optimal number of clusters in a k-means clustering algorithm. The method consists of plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. The "elbow" is the point of inflection on the curve where the explained variation begins to decrease at a slower rate. This point is considered the optimal number of clusters because adding more clusters will not significantly improve the explained variation.

Methods description

  • range(start, end): This generates a sequence of numbers from start (inclusive) to end (exclusive), representing the range of possible cluster numbers to be tested;
  • kmeans.inertia_: This attribute of the KMeans object retrieves the inertia value calculated for the current clustering configuration;
  • cs: This is an empty list that will store the "inertia" values calculated for each number of clusters. Inertia represents the sum of squared distances of samples to their closest cluster center;
  • plt.plot(): This function from the matplotlib library (matplotlib.pyplot) is used to create a line plot. It plots the number of clusters on the x-axis against the corresponding inertia values (CS) on the y-axis;
  • plt.title(), plt.xlabel(), plt.ylabel(): These functions set the title, x-axis label, and y-axis label of the plot, respectively;
  • plt.show(): This function displays the plot.

Tarefa

  1. Evaluate the kmeans from 1 to 10.
  2. Plot the graph.

Mark tasks as Completed
Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Seção 1. Capítulo 10
AVAILABLE TO ULTIMATE ONLY
some-alt