Standardization
Finally, let's start with the analysis of the PCA mathematical model.
First of all, we start by standardizing the data that the algorithm will work with. By standardization is meant the reduction of all continuous variables to a set where the mean will be equal to 0
.
This is an important step because PCA cannot work properly if there is a variable in the dataset with a range of values 0-20
and 100-10,000
, for example. PCA will start to "ignore" the characteristic with a small spread (0-20
) and it will not be able to affect the result of the algorithm.
The formula for data standardization is very simple. Subtract the mean from the value of the variable and divide the result by the standard deviation:
The scikit-learn
Python library allows us to do this in 1 line:
# Importing libraries
import numpy as np
from sklearn.preprocessing import StandardScaler
# Standardizing
X = np.asarray([[1, 3],[2, 10],[3, 35],[4, 40]], dtype = np.float64)
X_scaled = StandardScaler().fit_transform(X)
Swipe to start coding
Implement standardization of X array using the numpy
functions np.mean()
and np.std()
.
Oplossing
Bedankt voor je feedback!
single
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.
Vat dit hoofdstuk samen
Explain code
Explain why doesn't solve task
Awesome!
Completion rate improved to 5.26
Standardization
Veeg om het menu te tonen
Finally, let's start with the analysis of the PCA mathematical model.
First of all, we start by standardizing the data that the algorithm will work with. By standardization is meant the reduction of all continuous variables to a set where the mean will be equal to 0
.
This is an important step because PCA cannot work properly if there is a variable in the dataset with a range of values 0-20
and 100-10,000
, for example. PCA will start to "ignore" the characteristic with a small spread (0-20
) and it will not be able to affect the result of the algorithm.
The formula for data standardization is very simple. Subtract the mean from the value of the variable and divide the result by the standard deviation:
The scikit-learn
Python library allows us to do this in 1 line:
# Importing libraries
import numpy as np
from sklearn.preprocessing import StandardScaler
# Standardizing
X = np.asarray([[1, 3],[2, 10],[3, 35],[4, 40]], dtype = np.float64)
X_scaled = StandardScaler().fit_transform(X)
Swipe to start coding
Implement standardization of X array using the numpy
functions np.mean()
and np.std()
.
Oplossing
Bedankt voor je feedback!
Awesome!
Completion rate improved to 5.26single