Challenge: Imputing Missing Values
The SimpleImputer class replaces missing values automatically.
from sklearn.impute import SimpleImputer
imputer = SimpleImputer()
Its key parameters:
missing_value: placeholder treated as missing (defaultnp.nan);strategy: method for filling gaps ('mean'by default);fill_value: used whenstrategy='constant'.
As a transformer, it provides methods such as .fit(), .transform(), and .fit_transform().
Choosing how to fill missing data is essential. A common approach:
- numerical features β mean;
- categorical features β most frequent value.
strategy options:
'mean'β fill with mean;'median'β fill with median;'most_frequent'β fill with mode;'constant'β fill with a specified value viafill_value.
missing_values defines which values are treated as missing (default NaN, but may be '' or another marker).
SimpleImputer expects a DataFrame, not a Series.
A single-column DataFrame must be selected using double brackets:
imputer.fit_transform(df[['column']])
fit_transform() returns a 2D array, but assigning back to a DataFrame column requires a 1D array.
Flatten the result using .ravel():
df['column'] = imputer.fit_transform(df[['column']]).ravel()
Swipe to start coding
You are given a DataFrame df containing penguin data. The 'sex' column has missing values. Fill them using the most frequent category.
- Import
SimpleImputer; - Create an imputer with
strategy='most_frequent'; - Apply it to
df[['sex']]; - Assign the imputed values back to
df['sex'].
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 3.13
Challenge: Imputing Missing Values
Swipe to show menu
The SimpleImputer class replaces missing values automatically.
from sklearn.impute import SimpleImputer
imputer = SimpleImputer()
Its key parameters:
missing_value: placeholder treated as missing (defaultnp.nan);strategy: method for filling gaps ('mean'by default);fill_value: used whenstrategy='constant'.
As a transformer, it provides methods such as .fit(), .transform(), and .fit_transform().
Choosing how to fill missing data is essential. A common approach:
- numerical features β mean;
- categorical features β most frequent value.
strategy options:
'mean'β fill with mean;'median'β fill with median;'most_frequent'β fill with mode;'constant'β fill with a specified value viafill_value.
missing_values defines which values are treated as missing (default NaN, but may be '' or another marker).
SimpleImputer expects a DataFrame, not a Series.
A single-column DataFrame must be selected using double brackets:
imputer.fit_transform(df[['column']])
fit_transform() returns a 2D array, but assigning back to a DataFrame column requires a 1D array.
Flatten the result using .ravel():
df['column'] = imputer.fit_transform(df[['column']]).ravel()
Swipe to start coding
You are given a DataFrame df containing penguin data. The 'sex' column has missing values. Fill them using the most frequent category.
- Import
SimpleImputer; - Create an imputer with
strategy='most_frequent'; - Apply it to
df[['sex']]; - Assign the imputed values back to
df['sex'].
Solution
Thanks for your feedback!
single