Course Content
Data Preprocessing
Data Preprocessing
Ordinal Encoding
If one-hot encoding has transformed a categorical variable into a binary form, then ordinal encoding uses a different transformation algorithm. But let's start with what data it is used for.
Ordinal encoding is a technique to encode categorical variables into numerical values based on the order or rank of the categories. It is best used when there is a clear category ranking or order. For example, in a survey asking respondents to rate their satisfaction with a product, the options may be "Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", or "Very Dissatisfied." These options can be encoded as 5, 4, 3, 2, and 1.
Ordinal encoding - is a useful method of encoding categorical data when the categories have a natural order or ranking. However, it should be used with caution, as it assumes that the distance between each category is equal, which may not always be the case. Additionally, ordinal encoding may not be suitable for algorithms that assume a linear relationship between the encoded categories, such as linear regression or neural networks.
Ordinal encoding takes into account the order in which the categorical variables are found, i.e. before using it, it is important to sort the variables from the lowest category to the highest.
Here's how to use ordinal encoding in Python:
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Read the dataset dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/students.csv') cat_columns = ['Grade Level'] # Sorting categorical variables categories = [[None, "Freshman", "Sophomore", "Junior", "Senior"]] # Create an OrdinalEncoder model encoder = OrdinalEncoder(categories=categories) # Transform dataset dataset[cat_columns] = encoder.fit_transform(dataset[cat_columns]) print(dataset)
The .fit_transform()
method of the OrdinalEncoder
class fits the encoder to the categorical variables and transforms them into numerical values.
Swipe to show code editor
Read the 'controls.csv'
dataset and transform the 'Education_Level'
column with ordinal encoding.
Solution
Thanks for your feedback!
Ordinal Encoding
If one-hot encoding has transformed a categorical variable into a binary form, then ordinal encoding uses a different transformation algorithm. But let's start with what data it is used for.
Ordinal encoding is a technique to encode categorical variables into numerical values based on the order or rank of the categories. It is best used when there is a clear category ranking or order. For example, in a survey asking respondents to rate their satisfaction with a product, the options may be "Very Satisfied", "Satisfied", "Neutral", "Dissatisfied", or "Very Dissatisfied." These options can be encoded as 5, 4, 3, 2, and 1.
Ordinal encoding - is a useful method of encoding categorical data when the categories have a natural order or ranking. However, it should be used with caution, as it assumes that the distance between each category is equal, which may not always be the case. Additionally, ordinal encoding may not be suitable for algorithms that assume a linear relationship between the encoded categories, such as linear regression or neural networks.
Ordinal encoding takes into account the order in which the categorical variables are found, i.e. before using it, it is important to sort the variables from the lowest category to the highest.
Here's how to use ordinal encoding in Python:
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Read the dataset dataset = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/9c23bf60-276c-4989-a9d7-3091716b4507/datasets/students.csv') cat_columns = ['Grade Level'] # Sorting categorical variables categories = [[None, "Freshman", "Sophomore", "Junior", "Senior"]] # Create an OrdinalEncoder model encoder = OrdinalEncoder(categories=categories) # Transform dataset dataset[cat_columns] = encoder.fit_transform(dataset[cat_columns]) print(dataset)
The .fit_transform()
method of the OrdinalEncoder
class fits the encoder to the categorical variables and transforms them into numerical values.
Swipe to show code editor
Read the 'controls.csv'
dataset and transform the 'Education_Level'
column with ordinal encoding.
Solution
Thanks for your feedback!