PCA is used to reduce the dimensions of a large data set such as a set of feature data . The high dimensional data is summarized using orthogonal transformations into uncorrelated principal components.
The dimension reduction is done by only selecting/using the eigenvectors (principle components) with large eigenvalues (the vectors that explain the most variance). The first component explains the most variance in the data, so an elbow plot is used to determine the significant number of principal components for an analysis. The set of eigenvectors form an uncorrelated orthogonal basis for the covariance matrix
Covariance matrix , is a matrix of all possible covariances between a set of variables . The entry of is .
Every eigenvector has a corresponding eigenvalue. The eigenvalue represents the amount of variance in the direction of the eigenvector. The eigenvectors for are found by solving
for the eigenvalues .
The principal components are the eigenvectors of the covariance matrix . Each eigenvalue represents a direction. For n-dimensional data there are n eigenvectors. Eigenvectors of are found by solving
for for each specific eigenvalue , which will result in a set of n eigenvectors and n eigenvalues for an matrix .
The eigenvectors form an orthogonal matrix used as a transformation matrix on the features to create a set of new features from the data.