Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Data Normalization | Normalization & Standardization
Preprocessing Data
course content

Conteúdo do Curso

Preprocessing Data

Preprocessing Data

1. Data Exploration
2. Data Cleaning
3. Data Validation
4. Normalization & Standardization
5. Data Encoding

Data Normalization

Data Normalization & Standardization provides rescaling numerical data into the appropriate interval. For example, ML models usually process the values in the interval [0; 1]. It's much more convenient to process the finite data, and also the data that all scaled to the same interval. There are two approaches:

  • to normalize data: move it to the interval [0; 1]
  • to standartizate data.

Data normalization

After normalization, each value will be represented as the value from the interval [0;1]. This allows us to easily understand how close the value is to the left or right bound. There is a quick demo.

For example, you got the 4th value from column fare:

How can you know, how many it is? How close is it to the mean value? Is it many or not?

You have to know the bounds for the fare price. If you normalized the data, you would get the value:

which is more informative: the value is small enough, only 10% out of the max price.

Normalization formula is:

Let's implement it manually:

1234
x_min, x_max = data['Fare'].min(), data['Fare'].max() normalized_fare = (data['Fare'] - x_min) / (x_max-x_min) print(normalized_fare[3]) # output: 0.10364429745562033
copy

Great! Now we see that 4th value is 0.10364, so this price is only 10% out of the maximum one.

Of course, there is a built-in way that does all the work: let's use the MinMaxScaler() from sklearn.

1234567
import pandas as pd from sklearn.preprocessing import MinMaxScaler # reading data # creating a scaler scaler = MinMaxScaler() data_scaled = scaler.fit_transform(data) # now the data is normalized
copy

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 4. Capítulo 1
toggle bottom row

Data Normalization

Data Normalization & Standardization provides rescaling numerical data into the appropriate interval. For example, ML models usually process the values in the interval [0; 1]. It's much more convenient to process the finite data, and also the data that all scaled to the same interval. There are two approaches:

  • to normalize data: move it to the interval [0; 1]
  • to standartizate data.

Data normalization

After normalization, each value will be represented as the value from the interval [0;1]. This allows us to easily understand how close the value is to the left or right bound. There is a quick demo.

For example, you got the 4th value from column fare:

How can you know, how many it is? How close is it to the mean value? Is it many or not?

You have to know the bounds for the fare price. If you normalized the data, you would get the value:

which is more informative: the value is small enough, only 10% out of the max price.

Normalization formula is:

Let's implement it manually:

1234
x_min, x_max = data['Fare'].min(), data['Fare'].max() normalized_fare = (data['Fare'] - x_min) / (x_max-x_min) print(normalized_fare[3]) # output: 0.10364429745562033
copy

Great! Now we see that 4th value is 0.10364, so this price is only 10% out of the maximum one.

Of course, there is a built-in way that does all the work: let's use the MinMaxScaler() from sklearn.

1234567
import pandas as pd from sklearn.preprocessing import MinMaxScaler # reading data # creating a scaler scaler = MinMaxScaler() data_scaled = scaler.fit_transform(data) # now the data is normalized
copy

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 4. Capítulo 1
toggle bottom row

Data Normalization

Data Normalization & Standardization provides rescaling numerical data into the appropriate interval. For example, ML models usually process the values in the interval [0; 1]. It's much more convenient to process the finite data, and also the data that all scaled to the same interval. There are two approaches:

  • to normalize data: move it to the interval [0; 1]
  • to standartizate data.

Data normalization

After normalization, each value will be represented as the value from the interval [0;1]. This allows us to easily understand how close the value is to the left or right bound. There is a quick demo.

For example, you got the 4th value from column fare:

How can you know, how many it is? How close is it to the mean value? Is it many or not?

You have to know the bounds for the fare price. If you normalized the data, you would get the value:

which is more informative: the value is small enough, only 10% out of the max price.

Normalization formula is:

Let's implement it manually:

1234
x_min, x_max = data['Fare'].min(), data['Fare'].max() normalized_fare = (data['Fare'] - x_min) / (x_max-x_min) print(normalized_fare[3]) # output: 0.10364429745562033
copy

Great! Now we see that 4th value is 0.10364, so this price is only 10% out of the maximum one.

Of course, there is a built-in way that does all the work: let's use the MinMaxScaler() from sklearn.

1234567
import pandas as pd from sklearn.preprocessing import MinMaxScaler # reading data # creating a scaler scaler = MinMaxScaler() data_scaled = scaler.fit_transform(data) # now the data is normalized
copy

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Data Normalization & Standardization provides rescaling numerical data into the appropriate interval. For example, ML models usually process the values in the interval [0; 1]. It's much more convenient to process the finite data, and also the data that all scaled to the same interval. There are two approaches:

  • to normalize data: move it to the interval [0; 1]
  • to standartizate data.

Data normalization

After normalization, each value will be represented as the value from the interval [0;1]. This allows us to easily understand how close the value is to the left or right bound. There is a quick demo.

For example, you got the 4th value from column fare:

How can you know, how many it is? How close is it to the mean value? Is it many or not?

You have to know the bounds for the fare price. If you normalized the data, you would get the value:

which is more informative: the value is small enough, only 10% out of the max price.

Normalization formula is:

Let's implement it manually:

1234
x_min, x_max = data['Fare'].min(), data['Fare'].max() normalized_fare = (data['Fare'] - x_min) / (x_max-x_min) print(normalized_fare[3]) # output: 0.10364429745562033
copy

Great! Now we see that 4th value is 0.10364, so this price is only 10% out of the maximum one.

Of course, there is a built-in way that does all the work: let's use the MinMaxScaler() from sklearn.

1234567
import pandas as pd from sklearn.preprocessing import MinMaxScaler # reading data # creating a scaler scaler = MinMaxScaler() data_scaled = scaler.fit_transform(data) # now the data is normalized
copy

Tarefa

Use the fit_transform() method with Fare data as an argument. Compare if the normalized fare value of 4th record is equal to the received manually one.

Note that your data should be a 2D-array.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Seção 4. Capítulo 1
Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
We're sorry to hear that something went wrong. What happened?
some-alt