Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Clustering Weather Data | K-Means Algorithm
Cluster Analysis in Python
course content

Conteúdo do Curso

Cluster Analysis in Python

Cluster Analysis in Python

1. K-Means Algorithm
2. K-Medoids Algorithm
3. Hierarchical Clustering
4. Spectral Clustering

Clustering Weather Data

Let's work with some real data. If you passed the 'Visualization in Python with matplotlib' course you might remember the USA cities' weather data. We will use an expanded version of this dataset there.

First, let's describe our dataset. It contains 15 columns: Country, City, All the 12 months, and Continent. Obviously, the months' columns are numerical and contain the average monthly temperature in Fahrenheit. For example, each row of this DataFrame looks like this.

We may guess that it would be logical to cluster by continent. But let's remind, that all the continents combine different climate types, which depend on proximity to the sea, to the ocean, to mountains, and so on. So, let's find out how will K-Means algorithm divide the observations.

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 1. Capítulo 6
toggle bottom row

Clustering Weather Data

Let's work with some real data. If you passed the 'Visualization in Python with matplotlib' course you might remember the USA cities' weather data. We will use an expanded version of this dataset there.

First, let's describe our dataset. It contains 15 columns: Country, City, All the 12 months, and Continent. Obviously, the months' columns are numerical and contain the average monthly temperature in Fahrenheit. For example, each row of this DataFrame looks like this.

We may guess that it would be logical to cluster by continent. But let's remind, that all the continents combine different climate types, which depend on proximity to the sea, to the ocean, to mountains, and so on. So, let's find out how will K-Means algorithm divide the observations.

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 1. Capítulo 6
toggle bottom row

Clustering Weather Data

Let's work with some real data. If you passed the 'Visualization in Python with matplotlib' course you might remember the USA cities' weather data. We will use an expanded version of this dataset there.

First, let's describe our dataset. It contains 15 columns: Country, City, All the 12 months, and Continent. Obviously, the months' columns are numerical and contain the average monthly temperature in Fahrenheit. For example, each row of this DataFrame looks like this.

We may guess that it would be logical to cluster by continent. But let's remind, that all the continents combine different climate types, which depend on proximity to the sea, to the ocean, to mountains, and so on. So, let's find out how will K-Means algorithm divide the observations.

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Let's work with some real data. If you passed the 'Visualization in Python with matplotlib' course you might remember the USA cities' weather data. We will use an expanded version of this dataset there.

First, let's describe our dataset. It contains 15 columns: Country, City, All the 12 months, and Continent. Obviously, the months' columns are numerical and contain the average monthly temperature in Fahrenheit. For example, each row of this DataFrame looks like this.

We may guess that it would be logical to cluster by continent. But let's remind, that all the continents combine different climate types, which depend on proximity to the sea, to the ocean, to mountains, and so on. So, let's find out how will K-Means algorithm divide the observations.

Tarefa

Given DataFrame data. Watch out, that numerical columns have indices 2 - 13! [object Object]

  1. Import the pandas, seaborn libraries with their standard aliases (pd and sns respectively), and KMeans from sklearn.clusters.
  2. Create range object with integers from 2 to 9 assigned to clusters variable.
  3. Iterate over clusters values. At each step:
  • Initialize KMeans model with new number of clusters (i).
  • Fit the model to 2-13 (indices) columns of data. Remember .iloc[] method of DataFrame. You can pass the first parameter to access certain rows, and the second to access columns.
  • Add model total within sum of squares value (value of .inertia_ attribute of model) to variances list.
  1. Display the seaborn lineplot "number of clusters vs total within sum of squares" (clusters - x-axis vs variances - y-axis).

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Seção 1. Capítulo 6
Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
We're sorry to hear that something went wrong. What happened?
some-alt