Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Implementing on Real Dataset | K-Means
Cluster Analysis
course content

Зміст курсу

Cluster Analysis

Cluster Analysis

1. Clustering Fundamentals
2. Core Concepts
3. K-Means
4. Hierarchical Clustering
5. DBSCAN
6. GMMs

book
Implementing on Real Dataset

Having practiced K-means on dummy data, you can now apply it to a real-world dataset: the wine dataset. Real datasets present complexities like unclear cluster structures and varying feature scales, offering a more practical clustering challenge.

You'll use the datasets.load_wine() function to load this dataset. The wine dataset features various attributes of different wines. Our aim is to see if K-means can uncover clusters reflecting wine similarities based on these attributes.

Real-world data often requires preprocessing. Feature scaling might be needed to ensure all features contribute equally to distance calculations in K-means.

To find the optimal number of clusters, you'll again use:

  • WSS method: analyze the elbow plot for a range of K values. Elbows might be less distinct in real data;

  • Silhouette score method: examine the Silhouette plot and average scores to find the best K. Scores may be more variable than with dummy data.

Visualizations are the key to understanding results:

  • Plotting 3 selected features in a 3D plot of the wine allows us to visually inspect the data distribution in a reduced feature space, without using dimensionality reduction;

  • WSS plot for elbow identification;

  • Silhouette plot for cluster quality.

K-means clusters visualized on the 3-feature 3D plot of the wine data, showing cluster assignments within this reduced feature space.

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 6
Ми дуже хвилюємося, що щось пішло не так. Що трапилося?
some-alt