Contenu du cours
Analyse de Cluster
Analyse de Cluster
How K-Means Algorithm Works?
Initialization
The algorithm begins by randomly selecting K initial cluster centers, also known as centroids. These centroids serve as the starting points for each cluster. A common approach is to randomly choose K data points from the dataset to be the initial centroids.
Assignment Step
In this step, each data point is assigned to the closest centroid. The distance is typically measured using Euclidean distance, but other distance metrics can also be used. Each data point is placed into the cluster represented by the nearest centroid.
Update Step
Once all data points are assigned to clusters, the centroids are recalculated. For each cluster, the new centroid is computed as the mean of all the data points belonging to that cluster. Essentially, the centroid is moved to the center of its cluster.
Iteration
Steps 2 and 3 are repeated iteratively. In each iteration, data points are reassigned to clusters based on the updated centroids, and then centroids are recalculated based on the new cluster assignments. This iterative process continues until a stopping criterion is met.
Convergence
The algorithm stops when one of the following conditions is met:
-
Centroids do not change significantly: the positions of the centroids stabilize, meaning that in subsequent iterations, there is minimal change in their locations;
-
Data point assignments do not change: data points remain in the same clusters, indicating that the cluster structure has become stable;
-
Maximum number of iterations is reached: a pre-defined maximum number of iterations is reached. This prevents the algorithm from running indefinitely.
Upon convergence, the K-means algorithm has partitioned the data into K clusters, with each cluster represented by its centroid. The resulting clusters aim to be internally cohesive and externally separated based on the chosen distance metric and the iterative refinement process.
Merci pour vos commentaires !