Conteúdo do Curso
Cluster Analysis in Python
Cluster Analysis in Python
What is a Hierarchical Clustering?
In this section, we will consider the hierarchical clustering - one more clustering algorithm.
How does this algorithm work? We will consider the AGNES (Agglomerative Nesting clustering) algorithm. It can be called a bottom-up approach since, in the beginning, all the points are in separate clusters. Then, some clusters are joined based on linkages, until all the necessary number of clusters will be reached.
In Python, the Hierarchical clustering algorithm is implemented within the AgglomerativeClustering
function from the sklearn.cluster
library. Unlike in the two previous sections, to predict the labels there you need to use the .fit_predict
method with data as a parameter.
Let's rewrite the necessary actions step by step:
- Create an
AgglomerativeClustering
model object with a necessary number of clusters (n_clusters
) and parameters set (will be considered in the next chapters). - Fit the data and predict the labels using the
.fit_predict()
function passing data as the parameter.
Agglomerative Clustering
has many parameters, among them are n_clusters
(as in the previous sections), linkage
, affinity
, and so on... We will consider them in future chapters.
Swipe to start coding
Given the 2-D dataset of points (training dataset). The scatter plot is shown below.
You need to perform a Hierarchical Clustering for this data. Follow the next steps:
- Import
AgglomerativeClustering
function fromsklearn.cluster
. - Create
AgglomerativeClustering
objectmodel
with3
clusters. - Apply
.fit_predict()
method tomodel
withdata
as a parameter. Add the result as'prediction'
column todata
. - Build a scatter plot of
data
with'x'
column on the x-axis,'y'
column on the y-axis, and each point colored with respect to the'prediction'
column.
Solução
Obrigado pelo seu feedback!
What is a Hierarchical Clustering?
In this section, we will consider the hierarchical clustering - one more clustering algorithm.
How does this algorithm work? We will consider the AGNES (Agglomerative Nesting clustering) algorithm. It can be called a bottom-up approach since, in the beginning, all the points are in separate clusters. Then, some clusters are joined based on linkages, until all the necessary number of clusters will be reached.
In Python, the Hierarchical clustering algorithm is implemented within the AgglomerativeClustering
function from the sklearn.cluster
library. Unlike in the two previous sections, to predict the labels there you need to use the .fit_predict
method with data as a parameter.
Let's rewrite the necessary actions step by step:
- Create an
AgglomerativeClustering
model object with a necessary number of clusters (n_clusters
) and parameters set (will be considered in the next chapters). - Fit the data and predict the labels using the
.fit_predict()
function passing data as the parameter.
Agglomerative Clustering
has many parameters, among them are n_clusters
(as in the previous sections), linkage
, affinity
, and so on... We will consider them in future chapters.
Swipe to start coding
Given the 2-D dataset of points (training dataset). The scatter plot is shown below.
You need to perform a Hierarchical Clustering for this data. Follow the next steps:
- Import
AgglomerativeClustering
function fromsklearn.cluster
. - Create
AgglomerativeClustering
objectmodel
with3
clusters. - Apply
.fit_predict()
method tomodel
withdata
as a parameter. Add the result as'prediction'
column todata
. - Build a scatter plot of
data
with'x'
column on the x-axis,'y'
column on the y-axis, and each point colored with respect to the'prediction'
column.
Solução
Obrigado pelo seu feedback!