Contenido del Curso
Cluster Analysis
Cluster Analysis
Implementing on Real Dataset
Having practiced K-means on dummy data, you can now apply it to a real-world dataset: the wine dataset. Real datasets present complexities like unclear cluster structures and varying feature scales, offering a more practical clustering challenge.
You'll use the datasets.load_wine()
function to load this dataset. The wine dataset features various attributes of different wines. Our aim is to see if K-means can uncover clusters reflecting wine similarities based on these attributes.
Real-world data often requires preprocessing. Feature scaling might be needed to ensure all features contribute equally to distance calculations in K-means.
To find the optimal number of clusters, you'll again use:
-
WSS method: analyze the elbow plot for a range of K values. Elbows might be less distinct in real data;
-
Silhouette score method: examine the Silhouette plot and average scores to find the best K. Scores may be more variable than with dummy data.
Visualizations are the key to understanding results:
-
Plotting 3 selected features in a 3D plot of the wine allows us to visually inspect the data distribution in a reduced feature space, without using dimensionality reduction;
-
WSS plot for elbow identification;
-
Silhouette plot for cluster quality.
K-means clusters visualized on the 3-feature 3D plot of the wine data, showing cluster assignments within this reduced feature space.
¡Gracias por tus comentarios!