I have discussed the usefulness of PCA with regards to proteins. There are other linear algebraic methods useful in matrix-factorization problems which entail decomposition of signals into components. Understanding these methods from linear algebra becomes essential for data analysis and machine learning. They provide convenient ways to break a matrix, which perhaps contains some data we are interested in, into simpler and meaningful pieces of information.
Chapter 3 of my book “A handbook of mathematical models with python” is about the linear-algebraic method, principal component analysis (PCA).
PCA
PCA reduces data dimensionality by projecting the data onto a subspace in a way that maximizes the variance. PCA works by decomposition of a covariance matrix. The matrix decomposition yields eigenvectors and eigenvalues.
📌 PCA is the same as finding the principal moments of inertia (in Physics), just that we have a body made of data points instead of a body having a mass. Instead of finding the eigenvalues and eigenvectors of the inertia tensor, we find them by decomposing the covariance matrix.
📌 PCA is an unsupervised method. It detects the directions in which data varies the most.
📌 Detecting anomaly by PCA: It uses a cluster method to detect an anomaly. Typically in unsupervised learning, a minor percentage of datapoints are assumed as outliers. So PCA assumes the inliers belong to large and dense clusters and the outliers belong to either smaller and sparse clusters or none, in short PCA determines what constitutes a normal class.
Considering time-series data and assuming 1% outliers in the dataset, here’s the result of PCA (anomalies marked in red).
The top principal components (PCs) with max. variances (eigenvalues) yield the percentage of variance explained (PVE) for the dataset.
LDA
Linear Discriminant Analysis (LDA) is a supervised method. It reduces data dimensionality by projecting the data onto a subspace in a way that maximizes the separability between classes/groups.
LDA works well for data with multiple classes. However, it makes assumptions of normal distribution and equal class covariances.
ICA
Independent component analysis (ICA) is a technique used to separate mixed signals into their independent components by maximizing their statistical independence. It is widely used in signal processing, image analysis, and biomedical data analysis.
-
Principal components are orthogonal in PCA; independent components are not orthogonal in ICA.
-
PCA assumes data follows a Gaussian distribution, identifying orthogonal components, whereas ICA assumes a non-Gaussian distribution and does not constrain components to be orthogonal.
SVD
SVD (singular value decomposition) can be applied to any rectangular or square matrix, and is a more fundamental operation from which PCA can be derived. SVD directly works on the data matrix, unlike PCA. SVD is used to determine the rank of a matrix.
SVD is used to compress high-dimensional images by preserving only significant singular values. If some of the singular values are zero, the corresponding terms do not appear in the decomposition (linear transformation) of singular martix the rank of which is the dimension of the image. The dimension of the image is therefore equal to the number of non-zero singular values.
FA
Factor analysis (FA) is used when we’re interested in identifying underlying behavior and causes and in modelling relationships between observed and hidden (latent) variables. The latent constructs (with expectations) inferred from data are called factors.
PCA and FA are similar in what they do, yet different in how they do it. PCA works directly with observed variables, FA assumes that they are influenced by few hidden factors and typically employs techniques like max. likelihood estimation (MLE).
While PCs are often complicated to interpret, factors can be aligned with behavioral or theoretical ideas mostly conceptualized under econometric conditions.