Contenido del Curso
Cluster Analysis in Python
Cluster Analysis in Python
Deciding the Number of Clusters
Well done! Let's look one more time at all the dendrograms for the weather data.
As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.
Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.
Swipe to start coding

- Import the
AgglomerativeClustering
function fromsklearn.cluster
. - Create
AgglomerativeClustering
model object namedmodel
with 3 clusters and using'average'
linkage. - Fit the numerical data (columns 3 - 14) to
model
and predict the labels. Save predicted labels as the'prediction'
column ofdata
. - For modified DataFrame
monthly_data
group the observations of columns fromcol
by'prediction'
column, and calculate the mean within each group. - Build line plot
'Month'
vs'Temp'
for each value of'Group'
usingmonthly_data
DataFrame.
Solución
¡Gracias por tus comentarios!
Deciding the Number of Clusters
Well done! Let's look one more time at all the dendrograms for the weather data.
As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.
Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.
Swipe to start coding

- Import the
AgglomerativeClustering
function fromsklearn.cluster
. - Create
AgglomerativeClustering
model object namedmodel
with 3 clusters and using'average'
linkage. - Fit the numerical data (columns 3 - 14) to
model
and predict the labels. Save predicted labels as the'prediction'
column ofdata
. - For modified DataFrame
monthly_data
group the observations of columns fromcol
by'prediction'
column, and calculate the mean within each group. - Build line plot
'Month'
vs'Temp'
for each value of'Group'
usingmonthly_data
DataFrame.
Solución
¡Gracias por tus comentarios!