Dimensionality Reduction — PCA, ICA and Manifold learningDimensionality Reduction to aid medical investigationSharmistha ChatterjeeBlockedUnblockFollowFollowingApr 1In machine learning and data science problems, the main objective remains to find the most relevant features that plays a dominant role in determining and influencing the output results.
In most data science problems, the dataset is overfilled with numerous features that results in overfitting and adds to huge training costs and makes the process considerably slow.
Successful research investigations has helped scientists, researchers and data scientists to evaluate algorithms that eliminates the curse of dimensionality, in different domains.
The algorithms developed over time aims to resolve some of the basic problems including:Reducing dimension of training data set by restoring variance and keeping relevant information intact.
Reducing training time and cost.
Structure ways of effective visualizations.
Trust me dimensionality reduction plays a major part in image, audio, video analysis particularly in investigating and curing diseases in our everyday life.
Its importance cannot be overlooked in the field of medical research where each and every hospital, clinics and diagnostic centers use machine learning (PCA, ICA, Manifold dimensionality reduction) techniques for diagnosing diseases and prescribing patients with right medicines.
In this blog, I primarily discuss about:Algorithms on dimensionality reduction including PCA (Principal Component Analysis), ICA (Independent Component Analysis) and Projection & Manifold Learning.
Applications in various field with special emphasis of Manifold Learning in the field of research.
Principal Component Analysis (PCA):As the first step to determine the PCA, is to perform eigenvalue decomposition of a covariance matrix, let’s first understand the concept of eigenvectors and eigenvalues before going into the details of PCA.
EigenVectorsAlmost all vectors change direction, when they are multiplied by a matrix A.
Eigenvectors (x) are certain exceptional vectors that are in the same direction as Ax.
Here the multiplied result Ax consists of the matrix A and the eigenvector x.
EigenValuesFor the above equation, the scalar ⋋ is termed the eigenvalue of A.
It further represents whether the exceptional vector (x) is stretched or shrunk or reversed or left.
When ⋋ =0, Ax=0x implies eigenvector x is in null space.
The matrix A gets projected into another space as determined by the direction of eigenvector.
For Ax = b , the output vector b represents the new transformed space.
ai/wiki/eigenvectorThe following animation shows how the matrix’s work transforming one space to the space where the eigenvectors lie.
Source : https://skymind.
ai/wiki/eigenvectorAs show in the above figure and animation it is evident, Eigenvector behave like a weathervane that changes in length without any change in direction.
It always points to the same direction and space, that the matrix is pushing all vectors toward.
PCA determines the directions along which the data has maximum variance by analyzing the importance of the data along these directions.
PCA is successful in returning two-dimensional planes if three-dimensional points are fed to PCA for evaluation, with the third dimension lying orthogonal to the plane.
Further, PCA is capable of assigning positive weights to vectors that represent points lying on the two-dimensional planes in compared to the third plane which has a weight of zero.
PCA is an useful tool for finding patterns in high-dimensional data when the data lies on or close to a d-dimensional linear sub-space.
PCA prioritizes dimensionality (by dropping low-variance dimensions in addition to centering, rotating and scaling data) that aids in improving neural network’s convergence speed and the overall quality of results.
Source : https://www.
pdfKernel PCAKernel PCA is an enhanced PCA method that incorporates a kernel function to determine principal components in different high-dimensional space, thereby facilitating solution of non-linear problems.
KPCA finds new directions based on kernel matrix.
KPCA is limited by an inability to determine importance of variables in contrast to linear PCA where it is possible to identify key variables that contribute to PCA score profiles.
Dimensionality reduction with Kernel PCAIndependent Component Analysis (ICA) :ICA is a computational method for separating a multivariate signals into additive subcomponents.
ICA works under the assumption that the subcomponents comprising the signal sources are non-Gaussian and are statistically independent from each other.
ICA plays a dominant role in medical research in biomedical signal extraction and separation.
Biomedical signals from many sources including hearts, brains and endocrine systems pose a challenge as researchers need to separate weak signals arriving from multiple sources contaminated with artifacts and noise.
The below figure is an example of how signal from disparate sources mixed in varied proportions is identified by ICA.
ICA aims to decompose the signals into subcomponents to identify the activity of distinct signal sources.
Source : https://www.
net/FazleyRafy/independent-component-analysis-cocktail-party-effect-solution-presentationLinear projections which are not necessarily orthogonal to each other.
Generalization of PCA where principal components are nearly statistically independent.
Source data comprises of linear mixtures of non-gaussian unknown latent variables.
Manifold LearningManifold learning for dimensionality reduction has recently gained much attention to assist image processing tasks such as segmentation, registration, tracking, recognition, and computational anatomy.
The drawbacks of PCA in handling dimensionality reduction problems for non-linear weird and curved shaped surfaces necessitated development of more advanced algorithms like Manifold Learning.
There are different variants of Manifold Learning that solves the problem of reducing data dimensions and feature-sets from uneven weird surfaces by sub-optimal data representation.
This kind of data representation selectively chooses data points from a low-dimensional manifold that is embedded in a high-dimensional space in an attempt to generalize linear frameworks like PCA.
Manifolds give a look of flat and featureless space that behaves like Euclidean space.
Manifold learning problems are unsupervised where it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications and loss of importance of information regarding some characteristic of the original variables.
The goal of the manifold-learning algorithms is to recover the original domain structure, up to some scaling and rotation.
The nonlinearity of these algorithms allows them to reveal the domain structure even when the manifold is not linearly embedded.
Manifold learning algorithms are divided in to two categories:Global methods: Allows high-dimensional data to be mapped from high-dimensional to low-dimensional such that the global properties are preserved.
Examples include Multidimensional Scaling (MDS), Isomaps covered in the following sections.
Local methods: Allows high-dimensional data to be mapped to low dimensional such that local properties are preserved.
Examples Locally linear embedding (LLE), Laplacian eigenmap (LE), Local tangent space alignment (LSTA), Hessian Eigenmapping (HLLE) covered in the following sections.
The different learning algorithms discovers different parameters and mechanisms to deduce a low-dimensional representation of the data with algorithms like Isomap, Locally Linear Embedding, Laplacian Eigen-maps, Semidefinite Embedding, etc.
The algorithms related to manifold learning and its applications vary in:Intensity of imagesCoordinate transformations, for instance different morphic warps [13, 11, 8], between the images.
Number of images to be analyzed to interpret similar group of objects.
The different algorithms of Manifold Learning when run on swiss_roll dataset for 10 neighbors and 2 components (which are un-correlated forming the reduced feature set with maximum variation) , for 5000 points is given below.
Different Types of Manifold learning on scikit-learn swiss roll datasetIsoMap Algorithm : One of the earliest approaches to manifold learning can be viewed as an extension of Multi-dimensional Scaling (MDS) or Kernel PCA.
IsoMap aims for a lower-dimensional embedding which maintains geodesic distances between all points.
The following figure explains how distances are evaluated for all points.
com/trudy/hriSource : http://web.
pdfEvaluate estimated geodesic distances between all pairs in XUse dynamic programming to approximate the geodesic of far off points by the shortest path along retained distances.
This figure illustrates how A.
Neighborhood graph is constructed using a fixed radius/K-means algorithm B.
Shortest paths/ Geodesic distances are computed and C.
d-dimensional embedding is applied with MDS.
com/trudy/hriLocally linear embedding (LLE): One of the dimensionality reduction algorithms is LLE thats maps to a lower-dimensional projection of the data by evaluating the best non-linear embedding among a series of local Principal Component Analyses.
It uses regularization parameter ‘r’, when the number of neighbors is greater than the number of input dimensions.
This also leads to regularization problem that manifests itself in embeddings and distorts the underlying geometry of the manifold.
This problem is handled by using multiple vectors in each neighborhood leading to modified locally linear embedding (MLLE), and the function is invoked with the keyword method = ‘modified’ , where n_neighbors > n_components.
Source : https://www.
com/trudy/hriLocal tangent space alignment (LTSA) : This algorithm behaves similar to LLE described above, with the only difference of characterizing the local geometry at each neighborhood via its tangent space.
Then low-dimensional data is deduced by reducing higher dimensions by applying PCA on the neighboring data points such that the local linear relations of the original data are preserved.
The resulting low dimensional dataset represents the neighboring point’s coordinates and serves to approximate the tangent space at the point.
It also performs a global optimization to align these local tangent spaces to learn the embedding.
The function is invoked by specification of keyword with method = ‘ltsa’.
Source : http://www.
pdfHessian Eigenmapping (Hessian-based LLE: HLLE): This dimensionality reduction solution involves solving the regularization problem of LLE.
In addition it can reduce dimensions for multi-class problems.
It revolves around a hessian-based quadratic form at each neighborhood which is used to recover the locally linear structure.
It suffers from the drawback of poor scaling for huge data size.
However, improvements with scikit-learn library incorporates optimizations to make its cost comparable to that of other LLE variants for small output dimension.
It requires specification of keyword with method = ‘hessian’ , and n_neighbors > n_components * (n_components + 3) / 2.
Neighborhood Selection in Hessian EigenMap Source : https://www.
org/paper/Supervised-Hessian-Eigenmap-for-dimensionality-Zhang-Tao/dbfc220970e9114e58798caa2ea33fba924379abSpectral Embedding: This algorithm aims to calculate a non-linear embedding by finding a low dimensional representation of the data using a spectral decomposition of the graph Laplacian.
The decomposition results in a graph that approximates the low dimensional manifold in the high dimensional space, ensuring the fact local distances are preserved and points which appear closer on the manifold are also mapped closer in the low dimensional space.
Source : https://networkx.
htmlMultidimensional scaling (MDS) : This dimensionality reduction algorithm is one of the multivariate techniques to measure the similarity or dissimilarity in data by transforming the data to a low-dimensional space where the distances between the original high-dimensional (N-dimensional) space gets reflected to the low-dimensional space.
The distance metric is measured in geometric spaces where the data ranges from ratings of similarity between objects to interaction frequencies of molecules, or trade indices between countries.
Multidimensional scaling is also known for identifying news representation by minimizing the quantity called STRESS or SSTRESS.
Source : https://www.
com/trudy/hriMDS Metric and Non-metric : MDS comprises of metric and non metric algorithms.
In Metric MDS, the input similarity matrix arises from a metric (and thus respects the triangular inequality), the distances between output two points are then set to be as close as possible to the similarity or dissimilarity data.
In the non-metric version, the algorithms will try to preserve the order of the distances, and hence seek for a monotonic relationship between the distances in the embedded space and the similarities/dissimilarities.
The monotonic relationship yields larger or smaller distances at different scales corresponding to larger or smaller dissimilarities.
Non-metric MDS Source: https://www.
htmlt-Distributed Stochastic Neighbor Embedding (t-SNE) : It is a non-linear technique for dimensionality reduction used in visualization of high-dimensional datasets.
t-SNE first computes the probability of a similar group of points both in high-dimensional and low dimensional space to deduce the conditional probability of a point A chooses its neighbor as point B.
Further, the conditional probabilities are minimized in higher and lower dimension space by minimizing sum of Kullback-Leibler divergence to have an efficient representation of points in low dimensional space from high dimensional space.
However t-SNE does not give optimal results with dimensions > 3 as it gets stuck in local optima.
Applications of Manifold learning in medical research:Identify under-lying structure that is under medical investigation from a large set of points embedded in a high dimensional space to a low-dimensional space.
Such example include mapping brain MRI images to a low-dimensional manifolds, learning osteoarthritis.
Manifold learning also plays an important role in patient position detection by considering neighboring areas of the manifold by constructing a graph.
Laplacian eigen-maps are used in this context to build a graph and approximate a high-dimensional representation of the data to a low-dimensional space by preserving the local neighborhood information.
Source : https://www.
com/matlabcentral/fileexchange/36141-laplacian-eigenmap-diffusion-map-manifold-learningManifold learning makes it convenient to make observations about the presence of disease or markers of development in populations by allowing easy statistical comparisons between groups through low-dimensional image representations.
Kernel PCA is widely known for dimensionality reduction on heterogeneous data sources when data from different sources are merged and evaluated to interpret the most prominent factors.
its used in reducing data dimensions for “non-linear” variations in metabolic data from living organisms.
The following figure shows an example of how Kernel PCA is used to analyze metabolites strongly associated with diet, exemplifying the fact that increased hippurate is associated with a larger intake of variety of vegetables and fruits.
Source : https://www.
com/articles/s41598-018-20121-wManifold learning facilitates medical research by generating synthetic images.
The following picture illustrates a semi-automatic reference maps generation pipeline that combines the results provided by the pathologists (and other manual segmentation mechanisms) together with automatic segmentation techniques, to yield different versions of synthetic images (generated from transformations like rotations, crops, geometric distortions and scaling).
It aids medical investigation by generating high quality images of implicit manifolds corresponding to different parts of the body.
Source : http://spiral.
pdfClassify image slices according to different regions of the body such as head, abdomen, and lower leg with the help of manifold structures that helps to clearly distinguish the images.
t-SNE is used to classify different tissues of the body, by identifying clusters based on similarity of the type of tissues.
Source : https://r2-tutorials.
htmlLocally Linear Embedding is used for brain function detection.
The following figure depicts that the first three major dimensions of embeddings of the brain functional networks.
Figure A portrays the embedding of the brain functional networks using LLE-SPD ( Symmetric Positive Definite (SPD)) in the LogEuclidean manifold space, while Figure B illustrates the embedding of the brain functional networks using LLE in the Euclidean space.
Source : http://www.
pdfApply Independent Component Analysis (ICA) to evaluate a set of basis images/signal, and represent an input image/signal as linear combination of those.
The below figure illustrates how ICA can be applied to isolate individual components in ECG.
Use of ICA to identify the individual componentsReferences:https://jakevdp.
com/articles/s41598-018-20121-wPlease let me know if there were any mistakes.
Suggestions and feedbacks are welcome.
Feel free to follow me at linkedin.