Course Content
Foundations of Machine Learning Track Overview
Foundations of Machine Learning Track Overview
Cluster Analysis
Clustering is a machine learning technique that groups similar data points into clusters based on their features or characteristics.
The main objective of clustering is to partition a dataset into subsets or clusters, where data points within the same cluster are more similar than those in other clusters.
Applications of clustering
Example
Let's consider the Iris dataset that contains measurements of various attributes of iris flowers belonging to three different species: Setosa, Versicolor, and Virginica.
The goal of the clustering task is to group similar iris flowers together based on their attribute measurements without using the species labels.
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Load the Iris dataset data = load_iris() X = data.data # Standardize the features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Create a KMeans clustering model kmeans = KMeans(n_clusters=3, random_state=42) # Fit the model to the scaled data kmeans.fit(X_scaled) # Predict the cluster labels for each data point labels = kmeans.labels_ # Create a colormap for the labels cmap = plt.get_cmap('viridis', 3) # Visualize the clusters in 2D using the first two features (Sepal Length and Sepal Width) plt.figure(figsize=(10, 6)) for i in range(3): cluster_data = X[labels == i] plt.scatter(cluster_data[:, 0], cluster_data[:, 1], label=f'Cluster {i}', cmap=cmap) plt.xlabel('Sepal Length (cm)') plt.ylabel('Sepal Width (cm)') plt.title('Iris Flower Clustering using K-Means') plt.legend() plt.show()
Thanks for your feedback!