Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge: Imputing Missing Values | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn
course content

Зміст курсу

ML Introduction with scikit-learn

ML Introduction with scikit-learn

1. Machine Learning Concepts
2. Preprocessing Data with Scikit-learn
3. Pipelines
4. Modeling

book
Challenge: Imputing Missing Values

The SimpleImputer class is designed to handle missing data by automatically replacing missing values.

When initialized, it can also be customized by setting its parameters:

  • missing_value: specifies the placeholder for the missing values. By default, this is np.nan;
  • strategy: the strategy used to impute missing values. 'mean' is the default value;
  • fill_value: Specifies the value to use for filling missing values when the strategy is 'constant'. By default, this is None.

Being a transformer, it has the following methods:

However, we also need to choose the value to impute.

The popular approach is to impute missing numerical values with the mean and missing categorical values with the mode (the most frequent value), as such imputation minimally impacts the distribution of the values.

The approach can be controlled using the strategy parameter:

  • strategy='mean': impute with mean along each column;
  • strategy='median': impute with median along each column;
  • strategy='most_frequent': impute with mode along each column;
  • strategy='constant': impute with constant number specified in fill_value parameter.

The missing_values parameter controls what values are considered missing. By default, it is NaN, but in different datasets, it can be an empty string '' or anything else.

When you use the .fit_transform() method of the SimpleImputer, it produces a 2D array as output. However, when updating a single column in a pandas DataFrame, you need a 1D array (or a Series).

To convert the 2D array into a 1D array suitable for assignment to a DataFrame column, you can apply the .ravel() method. This method flattens the array. Here's how you can update a column after imputation:

This approach ensures that the imputed values are correctly formatted and assigned back to the DataFrame.

Завдання
test

Swipe to begin your solution

Your task is to impute the NaN values of the 'sex' column using SimpleImputer. Since you are dealing with a categorical column, you will replace null values with the most frequent value (the most common approach).

  1. Import the SimpleImputer.
  2. Create a SimpleImputer object with the desired strategy.
  3. Impute the missing of the 'sex' column using the imputer object.

Once you've completed this task, click the button below the code to check your solution.

Рішення

Great! We dealt with the missing values problem in our dataset. We removed the rows with more than one null and imputed the 'sex' column with the most frequent value – MALE.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 4
toggle bottom row

book
Challenge: Imputing Missing Values

The SimpleImputer class is designed to handle missing data by automatically replacing missing values.

When initialized, it can also be customized by setting its parameters:

  • missing_value: specifies the placeholder for the missing values. By default, this is np.nan;
  • strategy: the strategy used to impute missing values. 'mean' is the default value;
  • fill_value: Specifies the value to use for filling missing values when the strategy is 'constant'. By default, this is None.

Being a transformer, it has the following methods:

However, we also need to choose the value to impute.

The popular approach is to impute missing numerical values with the mean and missing categorical values with the mode (the most frequent value), as such imputation minimally impacts the distribution of the values.

The approach can be controlled using the strategy parameter:

  • strategy='mean': impute with mean along each column;
  • strategy='median': impute with median along each column;
  • strategy='most_frequent': impute with mode along each column;
  • strategy='constant': impute with constant number specified in fill_value parameter.

The missing_values parameter controls what values are considered missing. By default, it is NaN, but in different datasets, it can be an empty string '' or anything else.

When you use the .fit_transform() method of the SimpleImputer, it produces a 2D array as output. However, when updating a single column in a pandas DataFrame, you need a 1D array (or a Series).

To convert the 2D array into a 1D array suitable for assignment to a DataFrame column, you can apply the .ravel() method. This method flattens the array. Here's how you can update a column after imputation:

This approach ensures that the imputed values are correctly formatted and assigned back to the DataFrame.

Завдання
test

Swipe to begin your solution

Your task is to impute the NaN values of the 'sex' column using SimpleImputer. Since you are dealing with a categorical column, you will replace null values with the most frequent value (the most common approach).

  1. Import the SimpleImputer.
  2. Create a SimpleImputer object with the desired strategy.
  3. Impute the missing of the 'sex' column using the imputer object.

Once you've completed this task, click the button below the code to check your solution.

Рішення

Great! We dealt with the missing values problem in our dataset. We removed the rows with more than one null and imputed the 'sex' column with the most frequent value – MALE.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 4
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
We're sorry to hear that something went wrong. What happened?
some-alt