Exploratory Data Analysis
Welcome to the course! Cluster analysis is one of the types of unsupervised learning - an algorithm that works with unlabeled data (i.e. the data with no 'response' variable). Unlike Classification problems, there we don't exactly know if there is a clear relation between characteristics or how many groups are in a data. The main goal of unsupervised learning is to find 'hidden' structures or relations in data.
Before digging into different algorithms, you always need to perform an EDA (Exploratory Data Analysis). It includes anomaly detection (such as NaN
or outliers), cleaning and preprocessing the data (detecting for missing values, or inappropriate formats), and some visualization to describe the simplest characteristics. Usually, the last part includes building box plots or bee swarm plots, or histograms.
Since our goal here is to divide the observations into groups, we mostly will use scatter plots using the seaborn
library. If you hear that name for the first time, I highly recommend you to pass the Introduction course on seaborn. Let's start our analysis!
Swipe to start coding
Given DataFrame data
with 2 columns named 'x'
and 'y'
. Let's output the scatter plot to get familiar with the data. Your tasks are:
- Import the
pandas
,seaborn
, andmatplotlib.pyplot
libraries with their standard name conventions (pd
,sns
, andplt
respectively). - Initialize a scatter plot. Use
'x'
column values for x-axis,'y'
for y-axis fromdata
DataFrame. - Display the plot.
Solution
Merci pour vos commentaires !
single
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Résumer ce chapitre
Expliquer le code dans file
Expliquer pourquoi file ne résout pas la tâche
Awesome!
Completion rate improved to 3.57
Exploratory Data Analysis
Glissez pour afficher le menu
Welcome to the course! Cluster analysis is one of the types of unsupervised learning - an algorithm that works with unlabeled data (i.e. the data with no 'response' variable). Unlike Classification problems, there we don't exactly know if there is a clear relation between characteristics or how many groups are in a data. The main goal of unsupervised learning is to find 'hidden' structures or relations in data.
Before digging into different algorithms, you always need to perform an EDA (Exploratory Data Analysis). It includes anomaly detection (such as NaN
or outliers), cleaning and preprocessing the data (detecting for missing values, or inappropriate formats), and some visualization to describe the simplest characteristics. Usually, the last part includes building box plots or bee swarm plots, or histograms.
Since our goal here is to divide the observations into groups, we mostly will use scatter plots using the seaborn
library. If you hear that name for the first time, I highly recommend you to pass the Introduction course on seaborn. Let's start our analysis!
Swipe to start coding
Given DataFrame data
with 2 columns named 'x'
and 'y'
. Let's output the scatter plot to get familiar with the data. Your tasks are:
- Import the
pandas
,seaborn
, andmatplotlib.pyplot
libraries with their standard name conventions (pd
,sns
, andplt
respectively). - Initialize a scatter plot. Use
'x'
column values for x-axis,'y'
for y-axis fromdata
DataFrame. - Display the plot.
Solution
Merci pour vos commentaires !
Awesome!
Completion rate improved to 3.57single