Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn OrdinalEncoder | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn

bookOrdinalEncoder

The next issue to address is categorical data. There are two main types of categorical variables.

Ordinal data has a natural order, while nominal data does not. Because of this order, categories can be encoded as numbers according to their ranking.

For example, a 'rate' column with the values 'Terrible', 'Bad', 'OK', 'Good', and 'Great' can be encoded as:

  • 'Terrible' β†’ 0
  • 'Bad' β†’ 1
  • 'OK' β†’ 2
  • 'Good' β†’ 3
  • 'Great' β†’ 4

To encode ordinal data, the OrdinalEncoder is used. It converts categories into integers starting from 0.

OrdinalEncoder is applied in the same way as other transformers. The main challenge lies in specifying the categories argument correctly.

For example, consider a dataset (not the penguins dataset) that contains an 'education' column. The first step is to check its unique values.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv') print(df['education'].unique())
copy

An ordered list of categorical values must be created, ranging from 'HS-grad' to 'Doctorate'.

1234567891011121314
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Load the data and assign X, y variables df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv') y = df['income'] # 'income' is a target in this dataset X = df.drop('income', axis=1) # Create a list of categories so HS-grad is encoded as 0 and Doctorate as 6 edu_categories = ['HS-grad', 'Some-college', 'Assoc', 'Bachelors', 'Masters', 'Prof-school', 'Doctorate'] # Initialize an OrdinalEncoder instance with the correct categories ord_enc = OrdinalEncoder(categories=[edu_categories]) # Transform the 'education' column and print it X['education'] = ord_enc.fit_transform(X[['education']]) print(X['education'])
copy

When transforming multiple features with OrdinalEncoder, the categories for each column must be explicitly specified. This is done through the categories argument:

encoder = OrdinalEncoder(categories=[col1_categories, col2_categories, ...])

1. Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

2. Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

question mark

Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

Select the correct answer

question mark

Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 5

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the difference between ordinal and nominal data in more detail?

How do I determine the correct order for ordinal categories?

What happens if I don't specify the categories argument correctly in OrdinalEncoder?

Awesome!

Completion rate improved to 3.13

bookOrdinalEncoder

Swipe to show menu

The next issue to address is categorical data. There are two main types of categorical variables.

Ordinal data has a natural order, while nominal data does not. Because of this order, categories can be encoded as numbers according to their ranking.

For example, a 'rate' column with the values 'Terrible', 'Bad', 'OK', 'Good', and 'Great' can be encoded as:

  • 'Terrible' β†’ 0
  • 'Bad' β†’ 1
  • 'OK' β†’ 2
  • 'Good' β†’ 3
  • 'Great' β†’ 4

To encode ordinal data, the OrdinalEncoder is used. It converts categories into integers starting from 0.

OrdinalEncoder is applied in the same way as other transformers. The main challenge lies in specifying the categories argument correctly.

For example, consider a dataset (not the penguins dataset) that contains an 'education' column. The first step is to check its unique values.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv') print(df['education'].unique())
copy

An ordered list of categorical values must be created, ranging from 'HS-grad' to 'Doctorate'.

1234567891011121314
import pandas as pd from sklearn.preprocessing import OrdinalEncoder # Load the data and assign X, y variables df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/adult_edu.csv') y = df['income'] # 'income' is a target in this dataset X = df.drop('income', axis=1) # Create a list of categories so HS-grad is encoded as 0 and Doctorate as 6 edu_categories = ['HS-grad', 'Some-college', 'Assoc', 'Bachelors', 'Masters', 'Prof-school', 'Doctorate'] # Initialize an OrdinalEncoder instance with the correct categories ord_enc = OrdinalEncoder(categories=[edu_categories]) # Transform the 'education' column and print it X['education'] = ord_enc.fit_transform(X[['education']]) print(X['education'])
copy

When transforming multiple features with OrdinalEncoder, the categories for each column must be explicitly specified. This is done through the categories argument:

encoder = OrdinalEncoder(categories=[col1_categories, col2_categories, ...])

1. Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

2. Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

question mark

Which statement best describes the use of the OrdinalEncoder for handling categorical data in a dataset?

Select the correct answer

question mark

Suppose you have a categorical column named 'Color'. Would it be appropriate to use the OrdinalEncoder to encode its values?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 5
some-alt