Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Case 1: Three Distinct Clusters | K-Means Algorithm
Cluster Analysis in Python
course content

Зміст курсу

Cluster Analysis in Python

Cluster Analysis in Python

1. K-Means Algorithm
2. K-Medoids Algorithm
3. Hierarchical Clustering
4. Spectral Clustering

Case 1: Three Distinct Clusters

The last line plot you built was not as clear as it was in the theory block before. As you remember, we should look for the point that has a significant drop before, but not after. Let's remind the plot from the previous task.

Why is the answer here is unclear? Let me show you the values of the total within sum of squares for n 2, 3, 4, and 5.

Number of clustersTotal within sum of squares
2429.45
3105.30
460.76
546.88

The value of metric drops by 75.5% between 2 and 3, by 42.3% between 3 and 4, and by 22.9% between 4 and 5. Further drops are less than 23%. So, both 3 and 4 are suitable variants for the number of clusters. Let's see if the algorithm will confirm our intuitive vision. For ease, let's remind the scatter plot of points.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Секція 1. Розділ 4
toggle bottom row

Case 1: Three Distinct Clusters

The last line plot you built was not as clear as it was in the theory block before. As you remember, we should look for the point that has a significant drop before, but not after. Let's remind the plot from the previous task.

Why is the answer here is unclear? Let me show you the values of the total within sum of squares for n 2, 3, 4, and 5.

Number of clustersTotal within sum of squares
2429.45
3105.30
460.76
546.88

The value of metric drops by 75.5% between 2 and 3, by 42.3% between 3 and 4, and by 22.9% between 4 and 5. Further drops are less than 23%. So, both 3 and 4 are suitable variants for the number of clusters. Let's see if the algorithm will confirm our intuitive vision. For ease, let's remind the scatter plot of points.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

Секція 1. Розділ 4
toggle bottom row

Case 1: Three Distinct Clusters

The last line plot you built was not as clear as it was in the theory block before. As you remember, we should look for the point that has a significant drop before, but not after. Let's remind the plot from the previous task.

Why is the answer here is unclear? Let me show you the values of the total within sum of squares for n 2, 3, 4, and 5.

Number of clustersTotal within sum of squares
2429.45
3105.30
460.76
546.88

The value of metric drops by 75.5% between 2 and 3, by 42.3% between 3 and 4, and by 22.9% between 4 and 5. Further drops are less than 23%. So, both 3 and 4 are suitable variants for the number of clusters. Let's see if the algorithm will confirm our intuitive vision. For ease, let's remind the scatter plot of points.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів

Все було зрозуміло?

The last line plot you built was not as clear as it was in the theory block before. As you remember, we should look for the point that has a significant drop before, but not after. Let's remind the plot from the previous task.

Why is the answer here is unclear? Let me show you the values of the total within sum of squares for n 2, 3, 4, and 5.

Number of clustersTotal within sum of squares
2429.45
3105.30
460.76
546.88

The value of metric drops by 75.5% between 2 and 3, by 42.3% between 3 and 4, and by 22.9% between 4 and 5. Further drops are less than 23%. So, both 3 and 4 are suitable variants for the number of clusters. Let's see if the algorithm will confirm our intuitive vision. For ease, let's remind the scatter plot of points.

Завдання

  1. Import seaborn library using standard naming convention, and KMeans from sklearn.cluster.
  2. Initialize a KMeans object with 3 clusters. Assign this object to model.
  3. Fit the data to the model.
  4. Predict the closest cluster each point belongs to. Save the result within the 'prediction' column of data.
  5. Create a scatter plot representing the distribution of points ('x' and 'y' columns of data on eponymous axes), having each point colored with respect to a predicted cluster.

Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 1. Розділ 4
Перейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
We're sorry to hear that something went wrong. What happened?
some-alt