Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Categorical Features Encoding | Core Concepts
Analyse de Cluster
course content

Contenu du cours

Analyse de Cluster

Analyse de Cluster

1. Clustering Fundamentals
2. Core Concepts
3. K-Means
4. Hierarchical Clustering
5. DBSCAN
6. GMMs

book
Categorical Features Encoding

Clustering algorithms like K-means need numerical data. Categorical features must be converted to numerical form using encoding. You will learn about ordinal and one-hot encoding.

Ordinal Encoding

Ordinal encoding converts ordered categories to numerical values, preserving their rank. For example, ordinal encoding of the 'education_level' column will transform its values from "High School", "Bachelor's", "Master's", 'PhD' to 0, 1, 2, 3.

This assumes a meaningful numerical difference between encoded values, which may not always be accurate.

python

One-Hot Encoding

One-hot encoding converts nominal (unordered) categories into binary columns, where each category becomes a new column. For a feature with n categories, this typically creates n columns — one column is 1 for the corresponding category, and the others are 0. However, only n-1 columns are actually needed to represent the information without redundancy.

For example, a 'color' column with values 'red', 'blue', and 'green' can be encoded with just two columns: 'color_red' and 'color_blue'. If a row has 0 in both, it implies the color is 'green'. By dropping one column, we avoid redundancy.

The removal of the redundant column is specified via drop='first':

python
question mark

Which encoding method is best suited for a categorical feature like 'country' with values such as "USA", "Canada", and "Germany", where there is no natural order?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 2
Nous sommes désolés de vous informer que quelque chose s'est mal passé. Qu'est-il arrivé ?
some-alt