Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Finding Optimal Number of Clusters Using Silhouette Score | K-Means
Cluster Analysis
course content

Kursinnhold

Cluster Analysis

Cluster Analysis

1. Clustering Fundamentals
2. Core Concepts
3. K-Means
4. Hierarchical Clustering
5. DBSCAN
6. GMMs

book
Finding Optimal Number of Clusters Using Silhouette Score

Besides the WSS method, the silhouette score is another valuable metric for determining the optimal number of clusters (K) in K-means. It evaluates how well each data point fits its cluster compared to others.

For each data point, the silhouette ccore considers:

  • Cohesion (a): average distance to points within its cluster;

  • Separation (b): average distance to points in the nearest other cluster.

The Silhouette Score is calculated as: (b - a) / max(a, b), ranging from -1 to +1.

Score interpretation:

  • +1: point is well-clustered;

  • ~0: point is on the cluster boundary;

  • -1: point may be misclassified.

Steps to find optimal K using silhouette score are the following:

  • Run K-means for a range of K values (e.g., K=2 to a reasonable limit);

  • For each K, calculate the average Silhouette Score;

  • Plot average silhouette score vs. K (silhouette plot);

  • Choose K with the highest average silhouette score.

Examining the silhouette plot, which shows scores for each point, can offer deeper insights into cluster consistency. Higher average scores and consistent scores across points are desirable.

In summary, while WSS minimizes within-cluster distances, silhouette score balances cohesion and separation. Using both provides a more robust approach to finding the optimal K.

question mark

What does a high average silhouette score (close to +1) indicate when evaluating clustering results?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 4

Spør AI

expand
ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

course content

Kursinnhold

Cluster Analysis

Cluster Analysis

1. Clustering Fundamentals
2. Core Concepts
3. K-Means
4. Hierarchical Clustering
5. DBSCAN
6. GMMs

book
Finding Optimal Number of Clusters Using Silhouette Score

Besides the WSS method, the silhouette score is another valuable metric for determining the optimal number of clusters (K) in K-means. It evaluates how well each data point fits its cluster compared to others.

For each data point, the silhouette ccore considers:

  • Cohesion (a): average distance to points within its cluster;

  • Separation (b): average distance to points in the nearest other cluster.

The Silhouette Score is calculated as: (b - a) / max(a, b), ranging from -1 to +1.

Score interpretation:

  • +1: point is well-clustered;

  • ~0: point is on the cluster boundary;

  • -1: point may be misclassified.

Steps to find optimal K using silhouette score are the following:

  • Run K-means for a range of K values (e.g., K=2 to a reasonable limit);

  • For each K, calculate the average Silhouette Score;

  • Plot average silhouette score vs. K (silhouette plot);

  • Choose K with the highest average silhouette score.

Examining the silhouette plot, which shows scores for each point, can offer deeper insights into cluster consistency. Higher average scores and consistent scores across points are desirable.

In summary, while WSS minimizes within-cluster distances, silhouette score balances cohesion and separation. Using both provides a more robust approach to finding the optimal K.

question mark

What does a high average silhouette score (close to +1) indicate when evaluating clustering results?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 3. Kapittel 4
Vi beklager at noe gikk galt. Hva skjedde?
some-alt