Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Imputing Missing Values | Preprocessing Data with Scikit-learn
Quizzes & Challenges
Quizzes
Challenges
/
Introduction to Machine Learning with Python

bookChallenge: Imputing Missing Values

The SimpleImputer class replaces missing values automatically.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer()

Its key parameters:

  • missing_value: placeholder treated as missing (default np.nan);
  • strategy: method for filling gaps ('mean' by default);
  • fill_value: used when strategy='constant'.

As a transformer, it provides methods such as .fit(), .transform(), and .fit_transform().

Choosing how to fill missing data is essential. A common approach:

  • numerical features β†’ mean;
  • categorical features β†’ most frequent value.

strategy options:

  • 'mean' β€” fill with mean;
  • 'median' β€” fill with median;
  • 'most_frequent' β€” fill with mode;
  • 'constant' β€” fill with a specified value via fill_value.

missing_values defines which values are treated as missing (default NaN, but may be '' or another marker).

Note
Note

SimpleImputer expects a DataFrame, not a Series. A single-column DataFrame must be selected using double brackets:

imputer.fit_transform(df[['column']])

fit_transform() returns a 2D array, but assigning back to a DataFrame column requires a 1D array. Flatten the result using .ravel():

df['column'] = imputer.fit_transform(df[['column']]).ravel()
Task

Swipe to start coding

You are given a DataFrame df containing penguin data. The 'sex' column has missing values. Fill them using the most frequent category.

  1. Import SimpleImputer;
  2. Create an imputer with strategy='most_frequent';
  3. Apply it to df[['sex']];
  4. Assign the imputed values back to df['sex'].

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 4
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how to use SimpleImputer with categorical data?

What happens if my data has multiple types of missing value markers?

Can you show an example of using a different strategy, like 'median'?

close

Awesome!

Completion rate improved to 3.13

bookChallenge: Imputing Missing Values

Swipe to show menu

The SimpleImputer class replaces missing values automatically.

from sklearn.impute import SimpleImputer
imputer = SimpleImputer()

Its key parameters:

  • missing_value: placeholder treated as missing (default np.nan);
  • strategy: method for filling gaps ('mean' by default);
  • fill_value: used when strategy='constant'.

As a transformer, it provides methods such as .fit(), .transform(), and .fit_transform().

Choosing how to fill missing data is essential. A common approach:

  • numerical features β†’ mean;
  • categorical features β†’ most frequent value.

strategy options:

  • 'mean' β€” fill with mean;
  • 'median' β€” fill with median;
  • 'most_frequent' β€” fill with mode;
  • 'constant' β€” fill with a specified value via fill_value.

missing_values defines which values are treated as missing (default NaN, but may be '' or another marker).

Note
Note

SimpleImputer expects a DataFrame, not a Series. A single-column DataFrame must be selected using double brackets:

imputer.fit_transform(df[['column']])

fit_transform() returns a 2D array, but assigning back to a DataFrame column requires a 1D array. Flatten the result using .ravel():

df['column'] = imputer.fit_transform(df[['column']]).ravel()
Task

Swipe to start coding

You are given a DataFrame df containing penguin data. The 'sex' column has missing values. Fill them using the most frequent category.

  1. Import SimpleImputer;
  2. Create an imputer with strategy='most_frequent';
  3. Apply it to df[['sex']];
  4. Assign the imputed values back to df['sex'].

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 4
single

single

some-alt