Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Defining the Number of Clusters | Spectral Clustering
Cluster Analysis in Python
course content

Course Content

Cluster Analysis in Python

Cluster Analysis in Python

1. K-Means Algorithm
2. K-Medoids Algorithm
3. Hierarchical Clustering
4. Spectral Clustering

Defining the Number of Clusters

There are several techniques we can use to help us to define the optimal number of clusters. There are also several techniques available for spectral clustering, but they are strongly based on hard math and are not implemented within several functions.

What can we use there? Remember the second section and the silhouette score we considered. We can also use it there.

Let's see what will be the result of building the silhouette scores chart for the circles' data (the scatter plot is below).

12345678910111213141516171819202122232425
# Import the libraries import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.cluster import SpectralClustering from sklearn.metrics import silhouette_score # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/model_data4.csv', index_col = 0) # Creating lists n_cl = range(2, 10) silhouettes = [] # Calculate the scores for different number of clusters for j in n_cl: model = SpectralClustering(n_clusters = j, affinity = 'nearest_neighbors') model.fit(data) silhouettes.append(silhouette_score(data, model.labels_)) # Visualize the results g = sns.lineplot(x = n_cl, y = silhouettes) g.set_xlabel('Number of clusters') g.set_ylabel('Silhouette score') plt.show()
copy

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 4. Chapter 4
toggle bottom row

Defining the Number of Clusters

There are several techniques we can use to help us to define the optimal number of clusters. There are also several techniques available for spectral clustering, but they are strongly based on hard math and are not implemented within several functions.

What can we use there? Remember the second section and the silhouette score we considered. We can also use it there.

Let's see what will be the result of building the silhouette scores chart for the circles' data (the scatter plot is below).

12345678910111213141516171819202122232425
# Import the libraries import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.cluster import SpectralClustering from sklearn.metrics import silhouette_score # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/model_data4.csv', index_col = 0) # Creating lists n_cl = range(2, 10) silhouettes = [] # Calculate the scores for different number of clusters for j in n_cl: model = SpectralClustering(n_clusters = j, affinity = 'nearest_neighbors') model.fit(data) silhouettes.append(silhouette_score(data, model.labels_)) # Visualize the results g = sns.lineplot(x = n_cl, y = silhouettes) g.set_xlabel('Number of clusters') g.set_ylabel('Silhouette score') plt.show()
copy

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 4. Chapter 4
toggle bottom row

Defining the Number of Clusters

There are several techniques we can use to help us to define the optimal number of clusters. There are also several techniques available for spectral clustering, but they are strongly based on hard math and are not implemented within several functions.

What can we use there? Remember the second section and the silhouette score we considered. We can also use it there.

Let's see what will be the result of building the silhouette scores chart for the circles' data (the scatter plot is below).

12345678910111213141516171819202122232425
# Import the libraries import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.cluster import SpectralClustering from sklearn.metrics import silhouette_score # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/model_data4.csv', index_col = 0) # Creating lists n_cl = range(2, 10) silhouettes = [] # Calculate the scores for different number of clusters for j in n_cl: model = SpectralClustering(n_clusters = j, affinity = 'nearest_neighbors') model.fit(data) silhouettes.append(silhouette_score(data, model.labels_)) # Visualize the results g = sns.lineplot(x = n_cl, y = silhouettes) g.set_xlabel('Number of clusters') g.set_ylabel('Silhouette score') plt.show()
copy

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

There are several techniques we can use to help us to define the optimal number of clusters. There are also several techniques available for spectral clustering, but they are strongly based on hard math and are not implemented within several functions.

What can we use there? Remember the second section and the silhouette score we considered. We can also use it there.

Let's see what will be the result of building the silhouette scores chart for the circles' data (the scatter plot is below).

12345678910111213141516171819202122232425
# Import the libraries import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.cluster import SpectralClustering from sklearn.metrics import silhouette_score # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/model_data4.csv', index_col = 0) # Creating lists n_cl = range(2, 10) silhouettes = [] # Calculate the scores for different number of clusters for j in n_cl: model = SpectralClustering(n_clusters = j, affinity = 'nearest_neighbors') model.fit(data) silhouettes.append(silhouette_score(data, model.labels_)) # Visualize the results g = sns.lineplot(x = n_cl, y = silhouettes) g.set_xlabel('Number of clusters') g.set_ylabel('Silhouette score') plt.show()
copy

Task

Table

Build the silhouette score chart for the weather data using silhouette scores and spectral clustering. Follow the next steps:

  1. Import SpectralClustering and silhouette_score functions from sklearn.cluster and sklearn.metrics respectively.
  2. Create a range object named n_cl with integer numbers from 2 to 9 (inclusive).
  3. Iterate over n_cl. On each step:
  • Create SpectralClustering model named model with j clusters and 'nearest_neighbors' affinity.
  • Fit (.fit() method) the numerical columns of data to model. The numerical columns are 3 - 14.
  • Append to silhouettes list the value of silhouette score. Pass the predicted labels_ as the second parameter.
  1. Build the seaborn line plot n_cl (x-axis) vs silhouettes (y-axis)

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 4. Chapter 4
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt