Challenge: Imputing Missing Values
The SimpleImputer class replaces missing values automatically.
from sklearn.impute import SimpleImputer
imputer = SimpleImputer()
Its key parameters:
missing_value: placeholder treated as missing (defaultnp.nan);strategy: method for filling gaps ('mean'by default);fill_value: used whenstrategy='constant'.
As a transformer, it provides methods such as .fit(), .transform(), and .fit_transform().
Choosing how to fill missing data is essential. A common approach:
- numerical features β mean;
- categorical features β most frequent value.
strategy options:
'mean'β fill with mean;'median'β fill with median;'most_frequent'β fill with mode;'constant'β fill with a specified value viafill_value.
missing_values defines which values are treated as missing (default NaN, but may be '' or another marker).
SimpleImputer expects a DataFrame, not a Series.
A single-column DataFrame must be selected using double brackets:
imputer.fit_transform(df[['column']])
fit_transform() returns a 2D array, but assigning back to a DataFrame column requires a 1D array.
Flatten the result using .ravel():
df['column'] = imputer.fit_transform(df[['column']]).ravel()
Swipe to start coding
You are given a DataFrame df containing penguin data. The 'sex' column has missing values. Fill them using the most frequent category.
- Import
SimpleImputer; - Create an imputer with
strategy='most_frequent'; - Apply it to
df[['sex']]; - Assign the imputed values back to
df['sex'].
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain how to use SimpleImputer with categorical data?
What happens if my data has multiple types of missing value markers?
Can you show an example of using a different strategy, like 'median'?
Awesome!
Completion rate improved to 3.13
Challenge: Imputing Missing Values
Swipe to show menu
The SimpleImputer class replaces missing values automatically.
from sklearn.impute import SimpleImputer
imputer = SimpleImputer()
Its key parameters:
missing_value: placeholder treated as missing (defaultnp.nan);strategy: method for filling gaps ('mean'by default);fill_value: used whenstrategy='constant'.
As a transformer, it provides methods such as .fit(), .transform(), and .fit_transform().
Choosing how to fill missing data is essential. A common approach:
- numerical features β mean;
- categorical features β most frequent value.
strategy options:
'mean'β fill with mean;'median'β fill with median;'most_frequent'β fill with mode;'constant'β fill with a specified value viafill_value.
missing_values defines which values are treated as missing (default NaN, but may be '' or another marker).
SimpleImputer expects a DataFrame, not a Series.
A single-column DataFrame must be selected using double brackets:
imputer.fit_transform(df[['column']])
fit_transform() returns a 2D array, but assigning back to a DataFrame column requires a 1D array.
Flatten the result using .ravel():
df['column'] = imputer.fit_transform(df[['column']]).ravel()
Swipe to start coding
You are given a DataFrame df containing penguin data. The 'sex' column has missing values. Fill them using the most frequent category.
- Import
SimpleImputer; - Create an imputer with
strategy='most_frequent'; - Apply it to
df[['sex']]; - Assign the imputed values back to
df['sex'].
Solution
Thanks for your feedback!
single