Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn StandardScaler, MinMaxScaler, MaxAbsScaler | Preprocessing Data with Scikit-learn
ML Introduction with scikit-learn

bookStandardScaler, MinMaxScaler, MaxAbsScaler

There are three popular approaches to scaling the data:

  • MinMaxScaler: scales features to a [0, 1] range;
  • MaxAbsScaler: scales features such as the maximum absolute value is 1 (so the data is guaranteed to be in a [-1, 1] range);
  • StandardScaler: standardize features making the mean equal to 0 and variance equal to 1.

To illustrate how scalers operate, consider the 'culmen_depth_mm' and 'body_mass_g' features from the penguins dataset. These features can be plotted to observe their scales.

MinMaxScaler

The MinMaxScaler works by subtracting the minimum value (to make values start from zero) and then dividing by (x_max - x_min) to make it less or equal to 1.

Here is the gif showing how MinMaxScaler works:

MaxAbsScaler

The MaxAbsScaler works by finding the maximum absolute value and dividing each value by it. This ensures that the maximum absolute value is 1.

StandardScaler

The idea of StandardScaler comes from statistics. It works by subtracting the mean (to center around zero) and dividing by the standard deviation (to make the variance equal to 1).

Note
Note

If you do not understand what the mean, standard deviation, and variance are, you can check our Learning Statistics with Python course. However, this knowledge is not mandatory to move on.

Here is a coding example with MinMaxScaler. Other scalers are applied in the same manner.

12345678910
import pandas as pd from sklearn.preprocessing import MinMaxScaler df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv') # Assign X,y variables X, y = df.drop('species', axis=1), df['species'] # Initialize a MinMaxScaler object and transform the X minmax = MinMaxScaler() X = minmax.fit_transform(X) print(X)
copy

The output is not the prettiest since scalers transform the data to a numpy array, but with pipelines, it won't be a problem.

Note
Note

You should only scale the feature columns (the X variable). There is no need to scale the target variable, as it would complicate the inverse transformation process.

Which Scaler to Use?

A StandardScaler is more sensitive to outliers, making it less suitable as a default scaler. If you prefer an alternative to StandardScaler, the choice between MinMaxScaler and MaxAbsScaler depends on personal preference, whether scaling data to the [0,1] range with MinMaxScaler or to [-1,1] with MaxAbsScaler.

1. What is the primary purpose of using MinMaxScaler in data preprocessing?

2. Why might you reconsider using StandardScaler for your dataset?

question mark

What is the primary purpose of using MinMaxScaler in data preprocessing?

Select the correct answer

question mark

Why might you reconsider using StandardScaler for your dataset?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 10

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain when to use each scaler in practice?

What are the main differences between MinMaxScaler and MaxAbsScaler?

Are there any drawbacks to using StandardScaler with outliers?

Awesome!

Completion rate improved to 3.13

bookStandardScaler, MinMaxScaler, MaxAbsScaler

Swipe to show menu

There are three popular approaches to scaling the data:

  • MinMaxScaler: scales features to a [0, 1] range;
  • MaxAbsScaler: scales features such as the maximum absolute value is 1 (so the data is guaranteed to be in a [-1, 1] range);
  • StandardScaler: standardize features making the mean equal to 0 and variance equal to 1.

To illustrate how scalers operate, consider the 'culmen_depth_mm' and 'body_mass_g' features from the penguins dataset. These features can be plotted to observe their scales.

MinMaxScaler

The MinMaxScaler works by subtracting the minimum value (to make values start from zero) and then dividing by (x_max - x_min) to make it less or equal to 1.

Here is the gif showing how MinMaxScaler works:

MaxAbsScaler

The MaxAbsScaler works by finding the maximum absolute value and dividing each value by it. This ensures that the maximum absolute value is 1.

StandardScaler

The idea of StandardScaler comes from statistics. It works by subtracting the mean (to center around zero) and dividing by the standard deviation (to make the variance equal to 1).

Note
Note

If you do not understand what the mean, standard deviation, and variance are, you can check our Learning Statistics with Python course. However, this knowledge is not mandatory to move on.

Here is a coding example with MinMaxScaler. Other scalers are applied in the same manner.

12345678910
import pandas as pd from sklearn.preprocessing import MinMaxScaler df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv') # Assign X,y variables X, y = df.drop('species', axis=1), df['species'] # Initialize a MinMaxScaler object and transform the X minmax = MinMaxScaler() X = minmax.fit_transform(X) print(X)
copy

The output is not the prettiest since scalers transform the data to a numpy array, but with pipelines, it won't be a problem.

Note
Note

You should only scale the feature columns (the X variable). There is no need to scale the target variable, as it would complicate the inverse transformation process.

Which Scaler to Use?

A StandardScaler is more sensitive to outliers, making it less suitable as a default scaler. If you prefer an alternative to StandardScaler, the choice between MinMaxScaler and MaxAbsScaler depends on personal preference, whether scaling data to the [0,1] range with MinMaxScaler or to [-1,1] with MaxAbsScaler.

1. What is the primary purpose of using MinMaxScaler in data preprocessing?

2. Why might you reconsider using StandardScaler for your dataset?

question mark

What is the primary purpose of using MinMaxScaler in data preprocessing?

Select the correct answer

question mark

Why might you reconsider using StandardScaler for your dataset?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 10
some-alt