Kursinhalt
Clusteranalyse
Clusteranalyse
How Hierarchical Clustering Works?
The algorithm can either start with each point in its own cluster and successively merge them (agglomerative clustering), or start with all points in one cluster and recursively split them into smaller clusters (divisive clustering).
Since agglomerative clustering is the more commonly used approach, we'll focus on it.
The most common type of hierarchical clustering is the bottom-up approach. The algorithm is as follows:
-
Initialization: each data point is treated as a single cluster;
-
Calculate proximity matrix: compute the distance between each pair of clusters;
-
Merge clusters: the two closest clusters are merged into a single cluster;
-
Update proximity matrix: recalculate the distances between the new cluster and all remaining clusters;
-
Repeat: steps 3 and 4 are repeated until all data points are merged into a single cluster.
Linkage Types
The proximity between two clusters is defined by the linkage type. Common linkage methods used in hierarchical clustering are:
-
Single linkage: the distance between the closest two points in the two clusters;
-
Complete linkage: the distance between the farthest two points in the two clusters;
-
Average linkage: the average distance between all pairs of points in the two clusters;
-
Ward's method: minimizes the increase in the total within-cluster variance when merging two clusters.
The choice of linkage method can impact the shape and structure of the resulting clusters. Experimentation and domain knowledge are often helpful in selecting the best method for your data.
Dendrogram
The results of hierarchical clustering are often visualized using a dendrogram.
Danke für Ihr Feedback!