Leer Fit Data into the Model

Now that our data is ready, let's fit it into the PCA model.

from sklearn.decomposition import PCA

pca_model = PCA(n_components = 2)
X_reduced = pca_model.fit_transform(X)

We have reduced the dimension of the dataset from 13 characteristics to 2! Now we can visualize the resulting components using the seaborn and matplotlib libraries:

import matplotlib.pyplot as plt
import seaborn as sns

sns.scatterplot(X_reduced[:,0], X_reduced[:,1])
plt.xlabel("PC1")
plt.ylabel("PC2")

It is logical, if you have a question, how to check the effectiveness of a particular PCA model. The performance of the PCA can be “counted” in two ways. The first is how much information the resulting components contain. The number of components that we decide to leave will determine how much information will eventually remain from the dataset. For example, let's display the amount of explained variance ratio:

print("Cumulative Variances (Percentage):")
print(pca_model.explained_variance_ratio_.cumsum() * 100)

Above is the result of the PCA model, which contains 13 main components from the wine dataset (i.e. the same number of variables as it was originally). So, you can see that the first component captures 36% of the information, two components capture 55%, three components capture 66%, and so on.

The graph makes it easy to visualize the number of components required to capture varying degrees of data variability:

The second way to evaluate the performance of a PCA model is to check the performance of other machine learning models into which we are going (if we really need to) fit the dataset. We can search for the optimal set of 3 variables - for example, the amount of time the machine learning model runs, the percentage of accuracy of the model, and the numbers of principal components.

Quiz

Why do you think only 3 components in the presented dataset can explain as much as 92% of the data?

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 3

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Awesome!

Completion rate improved to 5.26

Veeg om het menu te tonen