Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Encoding Categorical Variables | Preprocessing Data with Scikit-learn
Introduction to Machine Learning with Python

bookChallenge: Encoding Categorical Variables

To summarize the previous three chapters, here is a table showing what encoder you should use:

In this challenge, you work with the penguins dataset (no missing values). All categorical features β€” including the target 'species' β€” must be encoded for ML use.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print(df.head())
copy

Keep in mind that 'island' and 'sex' are categorical features and 'species' is a categorical target.

Task

Swipe to start coding

You are given a DataFrame df. Encode all categorical columns:

  1. Import OneHotEncoder and LabelEncoder from sklearn.preprocessing.
  2. Split the data into X (features) and y (target).
  3. Create a OneHotEncoder and apply it to the 'island' and 'sex' columns in X.
  4. Replace those original columns with their encoded versions.
  5. Use LabelEncoder on the 'species' column to encode y.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

bookChallenge: Encoding Categorical Variables

Swipe to show menu

To summarize the previous three chapters, here is a table showing what encoder you should use:

In this challenge, you work with the penguins dataset (no missing values). All categorical features β€” including the target 'species' β€” must be encoded for ML use.

12345
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed.csv') print(df.head())
copy

Keep in mind that 'island' and 'sex' are categorical features and 'species' is a categorical target.

Task

Swipe to start coding

You are given a DataFrame df. Encode all categorical columns:

  1. Import OneHotEncoder and LabelEncoder from sklearn.preprocessing.
  2. Split the data into X (features) and y (target).
  3. Create a OneHotEncoder and apply it to the 'island' and 'sex' columns in X.
  4. Replace those original columns with their encoded versions.
  5. Use LabelEncoder on the 'species' column to encode y.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 8
single

single

some-alt