Conteúdo do Curso
Cluster Analysis
Cluster Analysis
Finding Optimal Number of Clusters Using WSS
In K-means clustering, determining the optimal number of clusters, K, is a critical decision. Choosing the right K is essential to uncover meaningful patterns in your data. Too few clusters might oversimplify the data, while too many might create overly specific and less useful clusters. Therefore, methods to guide your choice of K are important.
One popular technique for finding the optimal K is the within-sum-of-squares (WSS) metric. WSS measures the sum of squared distances between each data point and its assigned centroid within a cluster. Essentially, WSS indicates how compact the clusters are. Lower WSS values suggest tighter, more compact clusters.
To use WSS to find the optimal K, you would typically follow these steps:
This elbow is often considered a strong indicator of the optimal K for the following reasons:
-
It suggests diminishing returns: adding more clusters beyond the elbow does not lead to a substantial improvement in WSS, meaning clusters are not getting significantly more compact;
-
It balances granularity and simplicity: the elbow often represents a good balance between capturing the essential structure in the data without overfitting or creating unnecessarily fine-grained clusters.
Keep in mind that the elbow method is a heuristic. The elbow point may not always be sharply defined, and other factors might influence your final choice of K. Visual inspection of the resulting clusters and your domain knowledge are valuable supplements to the elbow method.
Obrigado pelo seu feedback!