Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Deciding the Number of Clusters | Hierarchical Clustering
Cluster Analysis in Python
course content

Course Content

Cluster Analysis in Python

Cluster Analysis in Python

1. K-Means Algorithm
2. K-Medoids Algorithm
3. Hierarchical Clustering
4. Spectral Clustering

Deciding the Number of Clusters

Well done! Let's look one more time at all the dendrograms for the weather data.

As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.

Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 3. Chapter 6
toggle bottom row

Deciding the Number of Clusters

Well done! Let's look one more time at all the dendrograms for the weather data.

As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.

Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 3. Chapter 6
toggle bottom row

Deciding the Number of Clusters

Well done! Let's look one more time at all the dendrograms for the weather data.

As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.

Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Well done! Let's look one more time at all the dendrograms for the weather data.

As you can see, the single linkage method's dendrogram is unreadable. The average linkage method most likely led us to three clusters (if you draw a horizontal line between 75 and 100 you will intersect one blue line, and two green. The complete and ward linkages methods lead us to 4 clusters. For complete linkage, you can draw the horizontal line between 120 and 150 (it will intersect two orange and two green lines), and between 400 and 600 for ward linkage. Let's see what will be the results of using three clusters with average linkage.

Note, that in the previous sections we considered the cases of 5 or 4 clusters. Let's see how it will work now.

Task

Table
  1. Import the AgglomerativeClustering function from sklearn.cluster.
  2. Create AgglomerativeClustering model object named model with 3 clusters and using 'average' linkage.
  3. Fit the numerical data (columns 3 - 14) to model and predict the labels. Save predicted labels as the 'prediction' column of data.
  4. For modified DataFrame monthly_data group the observations of columns from col by 'prediction' column, and calculate the mean within each group.
  5. Build line plot 'Month' vs 'Temp' for each value of 'Group' using monthly_data DataFrame.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 3. Chapter 6
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt