Course Content
Principal Component Analysis
Principal Component Analysis
Explore Dataset
Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine
from the scikit-learn
set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.
Let's load the dataset:
Now let's explore the dataset to understand what data we are working with. Let's convert the numpy
array X
to a pandas
dataframe and check the amount of missing data:
To get a complete description of each column (mean, standard deviation, etc.), use the .describe()
method:
Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler()
class:
Task
Read the data from the train.csv
(from web) file. Remove the "Id"
column from the dataset and standardize it.
Thanks for your feedback!
Explore Dataset
Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine
from the scikit-learn
set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.
Let's load the dataset:
Now let's explore the dataset to understand what data we are working with. Let's convert the numpy
array X
to a pandas
dataframe and check the amount of missing data:
To get a complete description of each column (mean, standard deviation, etc.), use the .describe()
method:
Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler()
class:
Task
Read the data from the train.csv
(from web) file. Remove the "Id"
column from the dataset and standardize it.
Thanks for your feedback!
Explore Dataset
Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine
from the scikit-learn
set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.
Let's load the dataset:
Now let's explore the dataset to understand what data we are working with. Let's convert the numpy
array X
to a pandas
dataframe and check the amount of missing data:
To get a complete description of each column (mean, standard deviation, etc.), use the .describe()
method:
Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler()
class:
Task
Read the data from the train.csv
(from web) file. Remove the "Id"
column from the dataset and standardize it.
Thanks for your feedback!
Now we will take a closer look at the creation of a PCA model using the example of one dataset. As a dataset, we will use wine
from the scikit-learn
set. It contains 13 wine characteristics and 3 classes. It is especially convenient for us because there are no categorical variables in it.
Let's load the dataset:
Now let's explore the dataset to understand what data we are working with. Let's convert the numpy
array X
to a pandas
dataframe and check the amount of missing data:
To get a complete description of each column (mean, standard deviation, etc.), use the .describe()
method:
Before loading the dataset into the PCA model, let's process our data. Based on the previous lessons, you may have noticed that an important step is data standardization. We implement this using the StandardScaler()
class:
Task
Read the data from the train.csv
(from web) file. Remove the "Id"
column from the dataset and standardize it.