Conteúdo do Curso
Cluster Analysis in Python
Cluster Analysis in Python
Is 4 the Optimal Number of Clusters?
The last chart (displayed below) left the question about an optimal number of clusters unanswered. Seems like 4 is the 'local maximum', but the value 5 is not significantly lower than 4. We need to consider both cases.
Let's watch the scatter plot of average January vs July temperatures in the case of 4 clusters.
# Import the libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.cluster import SpectralClustering # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0) # Create the model model = SpectralClustering(n_clusters = 4, affinity = 'nearest_neighbors') # Fit the data and predict the labels data['prediction'] = model.fit_predict(data.iloc[:,2:14]) # Visualize the results sns.scatterplot(x = 'Jan', y = 'Jul', hue = 'prediction', data = data) plt.show()
The clustering seems logical, it splits the cities into different disjoint groups. But what if we build the same chart but for 5 clusters? That will be your task!
Swipe to start coding

- Import
SpectralClustering
function fromsklearn.cluster
. - Create a
SpectralClustering
model with 5 clusters using the'nearest_neighbors'
affinity. - Fit the 3-14 columns of
data
to themodel
and predict the labels. Save the result within the'prediction'
column of data. - Build the
seaborn
scatter plot with average January (column'Jan'
) vs July (column'Jul'
) temperatures for each cluster (column'prediction'
).
Solução
Obrigado pelo seu feedback!
Is 4 the Optimal Number of Clusters?
The last chart (displayed below) left the question about an optimal number of clusters unanswered. Seems like 4 is the 'local maximum', but the value 5 is not significantly lower than 4. We need to consider both cases.
Let's watch the scatter plot of average January vs July temperatures in the case of 4 clusters.
# Import the libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.cluster import SpectralClustering # Read the data data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/138ab9ad-aa37-4310-873f-0f62abafb038/Cities+weather.csv', index_col = 0) # Create the model model = SpectralClustering(n_clusters = 4, affinity = 'nearest_neighbors') # Fit the data and predict the labels data['prediction'] = model.fit_predict(data.iloc[:,2:14]) # Visualize the results sns.scatterplot(x = 'Jan', y = 'Jul', hue = 'prediction', data = data) plt.show()
The clustering seems logical, it splits the cities into different disjoint groups. But what if we build the same chart but for 5 clusters? That will be your task!
Swipe to start coding

- Import
SpectralClustering
function fromsklearn.cluster
. - Create a
SpectralClustering
model with 5 clusters using the'nearest_neighbors'
affinity. - Fit the 3-14 columns of
data
to themodel
and predict the labels. Save the result within the'prediction'
column of data. - Build the
seaborn
scatter plot with average January (column'Jan'
) vs July (column'Jul'
) temperatures for each cluster (column'prediction'
).
Solução
Obrigado pelo seu feedback!