Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Ordinal Encoding | Processing Categorical Data
Data Preprocessing
course content

Course Content

Data Preprocessing

Data Preprocessing

1. Brief Introduction
2. Processing Quantitative Data
3. Processing Categorical Data
4. Time Series Data Processing
5. Feature Engineering
6. Moving on to Tasks

book
Ordinal Encoding

If one-hot encoding has transformed a categorical variable into a binary form, then ordinal encoding uses a different transformation algorithm. But let's start with what data it is used for.

Ordinal encoding is a technique to encode categorical variables into numerical values based on the order or rank of the categories. It is best used when there is a clear category ranking or order. For example, in a survey asking respondents to rate their satisfaction with a product, the options may be "Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", or "Very Dissatisfied." These options can be encoded as 5, 4, 3, 2, and 1.

Ordinal encoding - is a useful method of encoding categorical data when the categories have a natural order or ranking. However, it should be used with caution, as it assumes that the distance between each category is equal, which may not always be the case. Additionally, ordinal encoding may not be suitable for algorithms that assume a linear relationship between the encoded categories, such as linear regression or neural networks.

Ordinal encoding takes into account the order in which the categorical variables are found, i.e. before using it, it is important to sort the variables from the lowest category to the highest.

Here's how to use ordinal encoding in Python:

1234567891011121314151617
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Read the dataset dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/students.csv') cat_columns = ['Grade Level'] # Sorting categorical variables categories = [[None, "Freshman", "Sophomore", "Junior", "Senior"]] # Create an OrdinalEncoder model encoder = OrdinalEncoder(categories=categories) # Transform dataset dataset[cat_columns] = encoder.fit_transform(dataset[cat_columns]) print(dataset)
copy

The .fit_transform() method of the OrdinalEncoder class fits the encoder to the categorical variables and transforms them into numerical values.

Task
test

Swipe to show code editor

Read the 'controls.csv' dataset and transform the 'Education_Level' column with ordinal encoding.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 3
toggle bottom row

book
Ordinal Encoding

If one-hot encoding has transformed a categorical variable into a binary form, then ordinal encoding uses a different transformation algorithm. But let's start with what data it is used for.

Ordinal encoding is a technique to encode categorical variables into numerical values based on the order or rank of the categories. It is best used when there is a clear category ranking or order. For example, in a survey asking respondents to rate their satisfaction with a product, the options may be "Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", or "Very Dissatisfied." These options can be encoded as 5, 4, 3, 2, and 1.

Ordinal encoding - is a useful method of encoding categorical data when the categories have a natural order or ranking. However, it should be used with caution, as it assumes that the distance between each category is equal, which may not always be the case. Additionally, ordinal encoding may not be suitable for algorithms that assume a linear relationship between the encoded categories, such as linear regression or neural networks.

Ordinal encoding takes into account the order in which the categorical variables are found, i.e. before using it, it is important to sort the variables from the lowest category to the highest.

Here's how to use ordinal encoding in Python:

1234567891011121314151617
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Read the dataset dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/students.csv') cat_columns = ['Grade Level'] # Sorting categorical variables categories = [[None, "Freshman", "Sophomore", "Junior", "Senior"]] # Create an OrdinalEncoder model encoder = OrdinalEncoder(categories=categories) # Transform dataset dataset[cat_columns] = encoder.fit_transform(dataset[cat_columns]) print(dataset)
copy

The .fit_transform() method of the OrdinalEncoder class fits the encoder to the categorical variables and transforms them into numerical values.

Task
test

Swipe to show code editor

Read the 'controls.csv' dataset and transform the 'Education_Level' column with ordinal encoding.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 3
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt