Case 1: Three Distinct Clusters
The last line plot you built was not as clear as it was in the theory block before. As you remember, we should look for the point that has a significant drop before, but not after. Let's remind the plot from the previous task.
Why is the answer here is unclear? Let me show you the values of the total within sum of squares for n 2, 3, 4, and 5.
| Number of clusters | Total within sum of squares |
|---|---|
| 2 | 429.45 |
| 3 | 105.30 |
| 4 | 60.76 |
| 5 | 46.88 |
The value of metric drops by 75.5% between 2 and 3, by 42.3% between 3 and 4, and by 22.9% between 4 and 5. Further drops are less than 23%. So, both 3 and 4 are suitable variants for the number of clusters. Let's see if the algorithm will confirm our intuitive vision. For ease, let's remind the scatter plot of points.
Swipe to start coding
- Import
seabornlibrary using standard naming convention, andKMeansfromsklearn.cluster. - Initialize a
KMeansobject with 3 clusters. Assign this object tomodel. - Fit the
datato themodel. - Predict the closest cluster each point belongs to. Save the result within the
'prediction'column ofdata. - Create a scatter plot representing the distribution of points (
'x'and'y'columns ofdataon eponymous axes), having each point colored with respect to a predicted cluster.
Løsning
Takk for tilbakemeldingene dine!
single
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Oppsummer dette kapittelet
Explain code
Explain why doesn't solve task
Awesome!
Completion rate improved to 3.57
Case 1: Three Distinct Clusters
Sveip for å vise menyen
The last line plot you built was not as clear as it was in the theory block before. As you remember, we should look for the point that has a significant drop before, but not after. Let's remind the plot from the previous task.
Why is the answer here is unclear? Let me show you the values of the total within sum of squares for n 2, 3, 4, and 5.
| Number of clusters | Total within sum of squares |
|---|---|
| 2 | 429.45 |
| 3 | 105.30 |
| 4 | 60.76 |
| 5 | 46.88 |
The value of metric drops by 75.5% between 2 and 3, by 42.3% between 3 and 4, and by 22.9% between 4 and 5. Further drops are less than 23%. So, both 3 and 4 are suitable variants for the number of clusters. Let's see if the algorithm will confirm our intuitive vision. For ease, let's remind the scatter plot of points.
Swipe to start coding
- Import
seabornlibrary using standard naming convention, andKMeansfromsklearn.cluster. - Initialize a
KMeansobject with 3 clusters. Assign this object tomodel. - Fit the
datato themodel. - Predict the closest cluster each point belongs to. Save the result within the
'prediction'column ofdata. - Create a scatter plot representing the distribution of points (
'x'and'y'columns ofdataon eponymous axes), having each point colored with respect to a predicted cluster.
Løsning
Takk for tilbakemeldingene dine!
single