Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Preprocessing Pipeline | Feature Engineering for Machine Learning
Data Preprocessing and Feature Engineering

bookChallenge: Preprocessing Pipeline

Task

Swipe to start coding

You are given the Titanic dataset from the seaborn library. Your task is to build a complete preprocessing pipeline that performs all essential data transformations used before machine learning.

Follow these steps:

  1. Load the dataset using sns.load_dataset("titanic").
  2. Handle missing values:
    • Numeric columns β†’ fill with mean.
    • Categorical columns β†’ fill with mode.
  3. Encode the categorical features sex and embarked using pd.get_dummies().
  4. Scale numeric columns age and fare using StandardScaler.
  5. Create a new feature family_size = sibsp + parch + 1.
  6. Combine all transformations into a function called preprocess_titanic(data) that returns the final processed DataFrame.
  7. Assign the processed dataset to a variable called processed_data.

Print the first 5 rows of the final DataFrame.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

Awesome!

Completion rate improved to 8.33

bookChallenge: Preprocessing Pipeline

Swipe to show menu

Task

Swipe to start coding

You are given the Titanic dataset from the seaborn library. Your task is to build a complete preprocessing pipeline that performs all essential data transformations used before machine learning.

Follow these steps:

  1. Load the dataset using sns.load_dataset("titanic").
  2. Handle missing values:
    • Numeric columns β†’ fill with mean.
    • Categorical columns β†’ fill with mode.
  3. Encode the categorical features sex and embarked using pd.get_dummies().
  4. Scale numeric columns age and fare using StandardScaler.
  5. Create a new feature family_size = sibsp + parch + 1.
  6. Combine all transformations into a function called preprocess_titanic(data) that returns the final processed DataFrame.
  7. Assign the processed dataset to a variable called processed_data.

Print the first 5 rows of the final DataFrame.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
single

single

some-alt