Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Encoding Categorical Variables | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn

bookChallenge: Encoding Categorical Variables

To summarize the previous three chapters, here is a table showing what encoder you should use:

In this challenge, the penguins dataset (without missing values) is provided. All categorical features, including the target ('species' column), must be encoded.

Here is a reminder of the dataset structure:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print(df.head())
copy

Keep in mind that 'island' and 'sex' are categorical features and 'species' is a categorical target.

Task

Swipe to start coding

Encode all categorical features. Use one-hot encoding for the 'island' and 'sex' columns, and apply a label encoder (or similar target encoder) for the 'species' column. Follow these steps to complete the encoding.

  1. Import OnehotEncoder and LabelEncoder.
  2. Initialize the features encoder object.
  3. Encode the categorical feature columns using the feature_enc object.
  4. Initialize the target encoder object.
  5. Encode the target using the label_enc object.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Which encoder should I use for each column in the penguins dataset?

Can you explain the difference between OrdinalEncoder, OneHotEncoder, and LabelEncoder?

What are the next steps to encode the categorical features and target in this dataset?

close

Awesome!

Completion rate improved to 3.13

bookChallenge: Encoding Categorical Variables

Swipe to show menu

To summarize the previous three chapters, here is a table showing what encoder you should use:

In this challenge, the penguins dataset (without missing values) is provided. All categorical features, including the target ('species' column), must be encoded.

Here is a reminder of the dataset structure:

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print(df.head())
copy

Keep in mind that 'island' and 'sex' are categorical features and 'species' is a categorical target.

Task

Swipe to start coding

Encode all categorical features. Use one-hot encoding for the 'island' and 'sex' columns, and apply a label encoder (or similar target encoder) for the 'species' column. Follow these steps to complete the encoding.

  1. Import OnehotEncoder and LabelEncoder.
  2. Initialize the features encoder object.
  3. Encode the categorical feature columns using the feature_enc object.
  4. Initialize the target encoder object.
  5. Encode the target using the label_enc object.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

close

Awesome!

Completion rate improved to 3.13
SectionΒ 2. ChapterΒ 8
single

single

some-alt