Course Content
Principal Component Analysis
Principal Component Analysis
Eigenvalues and Eigenvectors
Let's move on to more complex concepts: eigenvalues and eigenvectors. At this step, it is required to calculate the eigenvalues and eigenvectors from the covariance matrix to obtain the principal components.
The first step is to calculate the eigenvalues of the covariance matrix. Already on the basis of the eigenvalues, the eigenvectors are calculated.
The resulting values are eigenvectors (i.e. principal components) that solve the mathematical problem of finding the direction of the axes that maximizes the variance between data points along that direction. To make it easier to understand, just imagine that the resulting principal components are a new, more convenient way of presenting the data, a new angle from which differences in the data become more visible to us.
At the output, we will get the same number of components as we originally had and there were variables in the dataset. For example, a dataset with 20 variables will receive 20 principal components at this stage.
The main detail is that each eigenvector has its own pair of eigenvalues. The larger the eigenvalue, the higher the significance of the resulting main component (eigenvector). The first component stores the most important information, the second a little less, and so on.
Why eigenvectors play such an important role in the formation of the principal components is a difficult question, the answer to which lies in a long mathematical proof. For now, we just need to know that it works.
Let's use numpy
to calculate eigenvalues and eigenvectors:
Task
Sort the resulting principal components (eigenvectors) in descending order of their value using the ind
list (indices of sorted results) and print output.
Thanks for your feedback!
Eigenvalues and Eigenvectors
Let's move on to more complex concepts: eigenvalues and eigenvectors. At this step, it is required to calculate the eigenvalues and eigenvectors from the covariance matrix to obtain the principal components.
The first step is to calculate the eigenvalues of the covariance matrix. Already on the basis of the eigenvalues, the eigenvectors are calculated.
The resulting values are eigenvectors (i.e. principal components) that solve the mathematical problem of finding the direction of the axes that maximizes the variance between data points along that direction. To make it easier to understand, just imagine that the resulting principal components are a new, more convenient way of presenting the data, a new angle from which differences in the data become more visible to us.
At the output, we will get the same number of components as we originally had and there were variables in the dataset. For example, a dataset with 20 variables will receive 20 principal components at this stage.
The main detail is that each eigenvector has its own pair of eigenvalues. The larger the eigenvalue, the higher the significance of the resulting main component (eigenvector). The first component stores the most important information, the second a little less, and so on.
Why eigenvectors play such an important role in the formation of the principal components is a difficult question, the answer to which lies in a long mathematical proof. For now, we just need to know that it works.
Let's use numpy
to calculate eigenvalues and eigenvectors:
Task
Sort the resulting principal components (eigenvectors) in descending order of their value using the ind
list (indices of sorted results) and print output.
Thanks for your feedback!
Eigenvalues and Eigenvectors
Let's move on to more complex concepts: eigenvalues and eigenvectors. At this step, it is required to calculate the eigenvalues and eigenvectors from the covariance matrix to obtain the principal components.
The first step is to calculate the eigenvalues of the covariance matrix. Already on the basis of the eigenvalues, the eigenvectors are calculated.
The resulting values are eigenvectors (i.e. principal components) that solve the mathematical problem of finding the direction of the axes that maximizes the variance between data points along that direction. To make it easier to understand, just imagine that the resulting principal components are a new, more convenient way of presenting the data, a new angle from which differences in the data become more visible to us.
At the output, we will get the same number of components as we originally had and there were variables in the dataset. For example, a dataset with 20 variables will receive 20 principal components at this stage.
The main detail is that each eigenvector has its own pair of eigenvalues. The larger the eigenvalue, the higher the significance of the resulting main component (eigenvector). The first component stores the most important information, the second a little less, and so on.
Why eigenvectors play such an important role in the formation of the principal components is a difficult question, the answer to which lies in a long mathematical proof. For now, we just need to know that it works.
Let's use numpy
to calculate eigenvalues and eigenvectors:
Task
Sort the resulting principal components (eigenvectors) in descending order of their value using the ind
list (indices of sorted results) and print output.
Thanks for your feedback!
Let's move on to more complex concepts: eigenvalues and eigenvectors. At this step, it is required to calculate the eigenvalues and eigenvectors from the covariance matrix to obtain the principal components.
The first step is to calculate the eigenvalues of the covariance matrix. Already on the basis of the eigenvalues, the eigenvectors are calculated.
The resulting values are eigenvectors (i.e. principal components) that solve the mathematical problem of finding the direction of the axes that maximizes the variance between data points along that direction. To make it easier to understand, just imagine that the resulting principal components are a new, more convenient way of presenting the data, a new angle from which differences in the data become more visible to us.
At the output, we will get the same number of components as we originally had and there were variables in the dataset. For example, a dataset with 20 variables will receive 20 principal components at this stage.
The main detail is that each eigenvector has its own pair of eigenvalues. The larger the eigenvalue, the higher the significance of the resulting main component (eigenvector). The first component stores the most important information, the second a little less, and so on.
Why eigenvectors play such an important role in the formation of the principal components is a difficult question, the answer to which lies in a long mathematical proof. For now, we just need to know that it works.
Let's use numpy
to calculate eigenvalues and eigenvectors:
Task
Sort the resulting principal components (eigenvectors) in descending order of their value using the ind
list (indices of sorted results) and print output.