Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Rand Index | Hierarchical Clustering
Cluster Analysis in Python

bookRand Index

Well, you may notice, that dendrograms look a bit different. Using a single linkage it's hard to define 4 clusters because the heights on the right side are too small. But complete linkage makes us think that 4 clusters are possible. Okay, we can experiment and compare the dendrograms, but what about the clustering results? Can we compare them?

The answer is yes, we can. The two clustering results can be compared by using Rand Index, which compares predicted labels across two models, and returns the number between 0 and 1. 1.0 stands for a perfect match, while 0.0 stands for an absolute lack of similarities between predicted labels.

In Python, you can access the rand index by using the rand_score() function from the sklearn.metrics library. This function receives only two parameters: predicted labels by two models. It's obvious that both lists/arrays must be the same size.

For example, we can compare the result of clustering well-clustered data (scatter plot is below) while using two different linkages: 'single' and 'ward'. To set linkage while clustering you need to set the linkage = '...' parameter within AgglomerativeClustering() function.

1234567891011121314151617181920
# Import the libraries import pandas as pd import matplotlib.pyplot as plt from sklearn.metrics import rand_score from sklearn.cluster import AgglomerativeClustering # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/model_data1.csv') # Creating the models model_single = AgglomerativeClustering(n_clusters = 3, linkage = 'single') model_ward = AgglomerativeClustering(n_clusters = 3, linkage = 'ward') # Fitting and predicting the labels labels_single = model_single.fit_predict(data) labels_ward = model_ward.fit_predict(data) # Compute the Rand index rand_index = rand_score(labels_single, labels_ward) print(f"The rand index for single and ward linkages models is {rand_index}")
copy

This code will output the following message:

The rand index for single and ward linkages models is 1.0

This means that both linkages will lead us to identical clustering results. This is kinda obvious since the points are divided into 3 clear clusters.

But what about the data we used in the previous chapters? Let's find out how similar will be models with different linkages used.

Завдання

Swipe to start coding

Let's figure out how close will be the results for the data from the last 2 previous chapters if we would like to split them into 4 clusters. The scatter plot is below.

Plot

Follow the next steps:

  1. Import rand_score and AgglomerativeClustering from sklearn.metrics and sklearn.cluster respectively.
  2. Create two AgglomerativeClustering objects:
  • model_single with 4 clusters and 'single' linkage.

  • model_complete with 4 clusters and 'complete' linkage.

    1. Fit the data to model and predict the labels:
  • labels_single for the labels predicted by the model_single model.

  • labels_complete for the labels predicted by the model_complete model.

    1. Compute the rand score using labels_single and labels_complete.

Рішення

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 4
single

single

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

close

Awesome!

Completion rate improved to 3.57

bookRand Index

Свайпніть щоб показати меню

Well, you may notice, that dendrograms look a bit different. Using a single linkage it's hard to define 4 clusters because the heights on the right side are too small. But complete linkage makes us think that 4 clusters are possible. Okay, we can experiment and compare the dendrograms, but what about the clustering results? Can we compare them?

The answer is yes, we can. The two clustering results can be compared by using Rand Index, which compares predicted labels across two models, and returns the number between 0 and 1. 1.0 stands for a perfect match, while 0.0 stands for an absolute lack of similarities between predicted labels.

In Python, you can access the rand index by using the rand_score() function from the sklearn.metrics library. This function receives only two parameters: predicted labels by two models. It's obvious that both lists/arrays must be the same size.

For example, we can compare the result of clustering well-clustered data (scatter plot is below) while using two different linkages: 'single' and 'ward'. To set linkage while clustering you need to set the linkage = '...' parameter within AgglomerativeClustering() function.

1234567891011121314151617181920
# Import the libraries import pandas as pd import matplotlib.pyplot as plt from sklearn.metrics import rand_score from sklearn.cluster import AgglomerativeClustering # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/model_data1.csv') # Creating the models model_single = AgglomerativeClustering(n_clusters = 3, linkage = 'single') model_ward = AgglomerativeClustering(n_clusters = 3, linkage = 'ward') # Fitting and predicting the labels labels_single = model_single.fit_predict(data) labels_ward = model_ward.fit_predict(data) # Compute the Rand index rand_index = rand_score(labels_single, labels_ward) print(f"The rand index for single and ward linkages models is {rand_index}")
copy

This code will output the following message:

The rand index for single and ward linkages models is 1.0

This means that both linkages will lead us to identical clustering results. This is kinda obvious since the points are divided into 3 clear clusters.

But what about the data we used in the previous chapters? Let's find out how similar will be models with different linkages used.

Завдання

Swipe to start coding

Let's figure out how close will be the results for the data from the last 2 previous chapters if we would like to split them into 4 clusters. The scatter plot is below.

Plot

Follow the next steps:

  1. Import rand_score and AgglomerativeClustering from sklearn.metrics and sklearn.cluster respectively.
  2. Create two AgglomerativeClustering objects:
  • model_single with 4 clusters and 'single' linkage.

  • model_complete with 4 clusters and 'complete' linkage.

    1. Fit the data to model and predict the labels:
  • labels_single for the labels predicted by the model_single model.

  • labels_complete for the labels predicted by the model_complete model.

    1. Compute the rand score using labels_single and labels_complete.

Рішення

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 4
single

single

some-alt