Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Exploratory Data Analysis | K-Means Algorithm
Cluster Analysis in Python
course content

Contenido del Curso

Cluster Analysis in Python

Cluster Analysis in Python

1. K-Means Algorithm
2. K-Medoids Algorithm
3. Hierarchical Clustering
4. Spectral Clustering

bookExploratory Data Analysis

Welcome to the course! Cluster analysis is one of the types of unsupervised learning - an algorithm that works with unlabeled data (i.e. the data with no 'response' variable). Unlike Classification problems, there we don't exactly know if there is a clear relation between characteristics or how many groups are in a data. The main goal of unsupervised learning is to find 'hidden' structures or relations in data.

Before digging into different algorithms, you always need to perform an EDA (Exploratory Data Analysis). It includes anomaly detection (such as NaN or outliers), cleaning and preprocessing the data (detecting for missing values, or inappropriate formats), and some visualization to describe the simplest characteristics. Usually, the last part includes building box plots or bee swarm plots, or histograms.

Since our goal here is to divide the observations into groups, we mostly will use scatter plots using the seaborn library. If you hear that name for the first time, I highly recommend you to pass the Introduction course on seaborn. Let's start our analysis!

Tarea

Given DataFrame data with 2 columns named 'x' and 'y'. Let's output the scatter plot to get familiar with the data. Your tasks are:

  1. Import the pandas, seaborn, and matplotlib.pyplot libraries with their standard name conventions (pd, sns, and plt respectively).
  2. Initialize a scatter plot. Use 'x' column values for x-axis, 'y' for y-axis from data DataFrame.
  3. Display the plot.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 1
toggle bottom row

bookExploratory Data Analysis

Welcome to the course! Cluster analysis is one of the types of unsupervised learning - an algorithm that works with unlabeled data (i.e. the data with no 'response' variable). Unlike Classification problems, there we don't exactly know if there is a clear relation between characteristics or how many groups are in a data. The main goal of unsupervised learning is to find 'hidden' structures or relations in data.

Before digging into different algorithms, you always need to perform an EDA (Exploratory Data Analysis). It includes anomaly detection (such as NaN or outliers), cleaning and preprocessing the data (detecting for missing values, or inappropriate formats), and some visualization to describe the simplest characteristics. Usually, the last part includes building box plots or bee swarm plots, or histograms.

Since our goal here is to divide the observations into groups, we mostly will use scatter plots using the seaborn library. If you hear that name for the first time, I highly recommend you to pass the Introduction course on seaborn. Let's start our analysis!

Tarea

Given DataFrame data with 2 columns named 'x' and 'y'. Let's output the scatter plot to get familiar with the data. Your tasks are:

  1. Import the pandas, seaborn, and matplotlib.pyplot libraries with their standard name conventions (pd, sns, and plt respectively).
  2. Initialize a scatter plot. Use 'x' column values for x-axis, 'y' for y-axis from data DataFrame.
  3. Display the plot.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 1
toggle bottom row

bookExploratory Data Analysis

Welcome to the course! Cluster analysis is one of the types of unsupervised learning - an algorithm that works with unlabeled data (i.e. the data with no 'response' variable). Unlike Classification problems, there we don't exactly know if there is a clear relation between characteristics or how many groups are in a data. The main goal of unsupervised learning is to find 'hidden' structures or relations in data.

Before digging into different algorithms, you always need to perform an EDA (Exploratory Data Analysis). It includes anomaly detection (such as NaN or outliers), cleaning and preprocessing the data (detecting for missing values, or inappropriate formats), and some visualization to describe the simplest characteristics. Usually, the last part includes building box plots or bee swarm plots, or histograms.

Since our goal here is to divide the observations into groups, we mostly will use scatter plots using the seaborn library. If you hear that name for the first time, I highly recommend you to pass the Introduction course on seaborn. Let's start our analysis!

Tarea

Given DataFrame data with 2 columns named 'x' and 'y'. Let's output the scatter plot to get familiar with the data. Your tasks are:

  1. Import the pandas, seaborn, and matplotlib.pyplot libraries with their standard name conventions (pd, sns, and plt respectively).
  2. Initialize a scatter plot. Use 'x' column values for x-axis, 'y' for y-axis from data DataFrame.
  3. Display the plot.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Welcome to the course! Cluster analysis is one of the types of unsupervised learning - an algorithm that works with unlabeled data (i.e. the data with no 'response' variable). Unlike Classification problems, there we don't exactly know if there is a clear relation between characteristics or how many groups are in a data. The main goal of unsupervised learning is to find 'hidden' structures or relations in data.

Before digging into different algorithms, you always need to perform an EDA (Exploratory Data Analysis). It includes anomaly detection (such as NaN or outliers), cleaning and preprocessing the data (detecting for missing values, or inappropriate formats), and some visualization to describe the simplest characteristics. Usually, the last part includes building box plots or bee swarm plots, or histograms.

Since our goal here is to divide the observations into groups, we mostly will use scatter plots using the seaborn library. If you hear that name for the first time, I highly recommend you to pass the Introduction course on seaborn. Let's start our analysis!

Tarea

Given DataFrame data with 2 columns named 'x' and 'y'. Let's output the scatter plot to get familiar with the data. Your tasks are:

  1. Import the pandas, seaborn, and matplotlib.pyplot libraries with their standard name conventions (pd, sns, and plt respectively).
  2. Initialize a scatter plot. Use 'x' column values for x-axis, 'y' for y-axis from data DataFrame.
  3. Display the plot.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 1. Capítulo 1
Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
some-alt