By continuing I agree with Terms & conditions,
Privacy policy, Cookie policy
Examples of Real Problems
Let's look at a real-life example of the application of the PCA method. Import the libraries with which we will work:
python# Linear algebra and data processingimport numpy as npimport pandas as pdfrom sklearn.preprocessing import StandardScaler# PCA modelfrom sklearn.decomposition import PCA# Data visualizationimport seaborn as snsimport matplotlib.pyplot as plt
Next, we read the train.csv
file (from web), which contains data on house sales with the characteristics of houses and their prices:
pythondata = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/7b22c447-77ad-48ae-a2d2-4e6714f7a4a6/train_S1.csv')
Let's process our data. This process includes dropping many characteristics from the dataset (we will leave only 10 variables - this way it will be easier for us to work with the results obtained so that there are not too many characteristics), as well as data scaling:
python# Columns that will remaincolumns_ndrop = ['YearBuilt', 'LotArea', 'MSSubClass', 'OverallQual', 'SalePrice', 'PoolArea', 'GarageArea', 'BedroomAbvGr', 'KitchenAbvGr', 'Fireplaces']data = data.drop(data.columns.difference(columns_ndrop), 1)data_sc = StandardScaler().fit_transform(data)
Let's create a PCA model:
pythonpca = PCA(n_components = 3, whiten = True)pca = pca.fit(data_sc)
Now, to explain the results obtained, we will create a heat map of the factor loading. In the next section, we will learn why we need it.
pythonfactor_analysis = pca.components_.T * np.sqrt(pca.explained_variance_)fig, ax = plt.subplots(figsize=(3, 20))sns.heatmap(factor_analysis, xticklabels = ["C1", "C2", "C3"],yticklabels = data_sc.columns, annot = True,cmap = "YlGnBu")plt.show()
In just a couple of steps, we reduced the dimension of the dataset from 10 characteristics to 3! In the next chapter, we will try to interpret the results of PCA.
Swipe to start coding
Read the train.csv
dataset (from web) and create a PCA model for it. There should be 4 main components.
Solution
Thanks for your feedback!