Deep Learning and DoughnutsMattia FerriniBlockedUnblockFollowFollowingMar 19Manifold learningUnder the manifold assumption, real-world high-dimensional data concentrates close to a non-linear low-dimensional manifold [2].

In other words, data lies approximately on a manifold of much lower dimension than the input space, a manifold that can be retrieved/learned [8]The manifold assumption is crucial in order to deal with the curse of dimensionality: many machine learning models problems seem hopeless if we expect the machine learning algorithm to learn functions with interesting variations across an highly dimensional space [6]Fortunately, it has been empirically proven that ANNs capture the geometric regularities of commonplace data thanks to their hierarchical, layered structure [3].

[3] shows experiments to prove the ability to cope with data that lies on or near a low-dimensional manifold.

However, how do ANN layers identify the mappings (representations) between the original data space to suitable lower dimensional manifolds?Homeomorphic Linear EmbeddingsAccording to the definition provided by [10], a homeomorphism, also called a continuous transformation, is an equivalence relation and one-to-one correspondence between points in two geometric figures or topological spaces that is continuous in both directions.

A homeomorphism which also preserves distances is called an isometry.

Affine transformations are another type of common geometric homeomorphism.

A continuous deformation between a coffee mug and a doughnut (torus) illustrating that they are homeomorphic.

But there need not be a continuous deformation for two spaces to be homeomorphic — only a continuous mapping with a continuous inverse function [4][1] looks at tanh layers.

A tanh-layer tanh(Wx+b) consists of:A linear transformation by the “weight” matrix WA translation by the vector bPoint-wise application of tanhWhile manifold learning methodologies explicitly learn lower dimensional spaces, neural network layers are non-linear mappings into a space that is not necessarily lower-dimensional.

This is actually the case: we look at tanh-layers with N inputs and N outputs.

In such tanh-layers, each layer stretches and squishes space, but it never cuts, breaks or folds it.

Intuitively, we can see that it preserves topological properties [.

] Tanh layers with N inputs and N outputs are homeomorphisms if the weight matrix W is non-singular.

(Though one needs to be careful about domain and range) [1].

A four-hidden-layers tanh ANN discriminates between two slightly entangled spirals by generating a new data representation where the two classes are linearly separable [1]The concept of homeomorphism and invertibility is deeply entwined with interpretability: Understanding how transformations in feature space are related to the corresponding input is an important step towards interpretable deep networks, invertible deep networks may play an important role in such analysis since, for example, one could potentially back-track a property from the feature space to the input space [11]An example of problems that arise in mapping manifolds not diffeomorphic to each other.

The “holes” in the first manifold prevent a smooth mapping to the second [12].

It is a good idea to characterize the learnability of different neural architectures by computable measures of data complexity, for example persistent homology [13]Unfortunately, it is not always possible to find an homeomorphic mapping.

If data is concentrated near a low-dimensional manifold with non-trivial topology, there is no continuous and invertible mapping to a blob-like manifold (the region where prior mass is concentrated) [12]Let’s get back to our goal of depicting what happens in an ANN layer.

By constructing an homotopy, we can analyze how increasing degree of non-linearity in the activation functions change how ANN layers map the data into different spaces.

Natural HomotopyTwo maps f0 and f1 are homotopic, f0 ≃ f1, if there exists a map, a homotopy, F : X × I → Y such that f0(x) = F(x, 0) and f1(x) = F(x, 1) for all x ∈ X [9][6] constructs an homotopy by transforming the node transfer function in a Single Layer Perceptron from a linear into a sigmoidal mapping:By using a natural homotopy which deforms linear networks into nonlinear networks, we were able to explore how the geometric representations commonly used in analyzing linear mappings are affected by network nonlinearities.

Specifically, the input data subspace is transformed by the network into a curvilinear embedded data manifold” [6]The data manifold for L=3, s=2 and three weights at ????=1 [6]An intuition of how curvature relates to the existence of multiple projections of y on Z [6]An example data manifold Z with boundaries Pa,b = Z ± (1/ |k|max)n where n is the normal to the surface.

For all desired vectors y in the region between Pa and Pb, there exists only one solution.

It is important to remark that the mapping is not homeomorphic: the mapping is not invertible and Z folds on itself, infinitelyConclusionLearning, under the manifold assumption, is equivalent to discovering a non-linear, lower-dimensional manifold.

In this short blog, I tried to provide a short, visual and surely not fully comprehensive intuition of how ANNs might map the original data space to a suitable, lower dimensional manifold.

A great tool to visualize the mappings (representations) at layer level for different ANN architectures and classification problems is available here [15] and it is awesome.

Disclaimer: the opinions in this blog are mine and so are the possible errors and misconceptionsReferences[1] http://colah.

github.

io/posts/2014-03-NN-Manifolds-Topology/[2] Cayton, Lawrence.

“Algorithms for manifold learning.

” Univ.

of California at San Diego Tech.

Rep 12.

1–17 (2005): 1.

()[3] Basri, Ronen, and David Jacobs.

“Efficient representation of low-dimensional manifolds using deep networks.

” arXiv preprint arXiv:1602.

04723 (2016).

[4] https://en.

wikipedia.

org/wiki/Homeomorphism[5] Coetzee, Frans M.

, and Virginia L.

Stonick.

“On a natural homotopy between linear and nonlinear single-layer networks.

” IEEE transactions on neural networks 7.

2 (1996): 307–317.

[6] Coetzee, Frans Martin, and V.

Stonick.

“Homotopy approaches for the analysis and solution of neural network and other nonlinear systems of equations.

” Doctoral Thesis, Carnegie Mellon University, May (1995).

[7] Adhikari, Mahima Ranjan.

Basic algebraic topology and its applications.

Springer, 2016.

[8] Pierre Geurts, Gilles Louppe, Louis Wehenkel, Transfer learning and related protocols, Lecture Notes, 2018[9] Jesper Moller, Homotopy Theory for Begineers, lecture notes[10] http://mathworld.

wolfram.

com/Homeomorphism.

html[11] Jacobsen, Jörn-Henrik, Arnold Smeulders, and Edouard Oyallon.

“i-revnet: Deep invertible networks.

” arXiv preprint arXiv:1802.

07088 (2018).

[12] Falorsi, Luca, et al.

“Explorations in homeomorphic variational auto-encoding.

” arXiv preprint arXiv:1807.

04689 (2018).

[13] Guss, William H.

, and Ruslan Salakhutdinov.

“On characterizing the capacity of neural networks using algebraic topology.

” arXiv preprint arXiv:1802.

04443 (2018).

[14] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville.

Deep learning.

MIT press, 2016.

[15] https://cs.

stanford.

edu/people/karpathy/convnetjs//demo/classify2d.

html.