Learn Implementing on Real Dataset

Swipe to show menu

Having practiced K-means on dummy data, you can now apply it to a real-world dataset: the wine dataset. Real datasets present complexities like unclear cluster structures and varying feature scales, offering a more practical clustering challenge.

You'll use the datasets.load_wine() function to load this dataset. The wine dataset features various attributes of different wines. Our aim is to see if K-means can uncover clusters reflecting wine similarities based on these attributes.

Real-world data often requires preprocessing. Feature scaling might be needed to ensure all features contribute equally to distance calculations in K-means.

To find the optimal number of clusters, you'll again use:

WSS method: analyze the elbow plot for a range of K values. Elbows might be less distinct in real data;
Silhouette score method: examine the Silhouette plot and average scores to find the best K. Scores may be more variable than with dummy data.

Visualizations are the key to understanding results:

Plotting 3 selected features in a 3D plot of the wine allows us to visually inspect the data distribution in a reduced feature space, without using dimensionality reduction;
WSS plot for elbow identification;
Silhouette plot for cluster quality.

K-means clusters visualized on the 3-feature 3D plot of the wine data, showing cluster assignments within this reduced feature space.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 6

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 6