Google and OpenAI Help You See What Neural Networks SeeJesus RodriguezBlockedUnblockFollowFollowingMar 7Interpretability is one of the biggest challenges of deep neural networks particularly those that deal with unstructured datasets such as images.
Understanding how an image classification model builds its knowledge is nearly impossible.
While image classification models have proven to be more effective than human-equivalent systems, the lack of interpretability limits its applicability as we can’t audit or effectively troubleshoot these models.
Recently, researchers from Google and OpenAI published a paper that proposes a method for visualizing the intermediate representations of an image classification method.
Titled “Exploring Neural Networks with Activation Atlases”, the research paper introduces a technique called activation atlases that help us understand what neural networks “see” when presented with an image dataset.
Activation atlases borrows some ideas from neuroscience to try to understand the intermediate image representations of a neural network.
When we receive visual sensory signals through our eyes, the information is processed by the neocortex region of the brain.
Different types of visual signals activate different sets of neurons that wire together to activate knowledge of the perceived objects.
The actual knowledge is not built by individual neurons but by groups of interconnected neurons.
The analogy of neurons wired together to build specific knowledge representations applies perfectly to image classification models.
The initial work in image classification interpretability focused on understanding the representation created by individual neurons which, although helpful, resulted limited when trying to understand the representations created by layers of the network.
Other techniques such as pairwise activations focused on exploring the connections between neurons but, very often, fail short given the high number of dimensions of image classification models.
Activation AtlasThe technique proposed by Google and OpenAI has its origins in a method call Feature Visualization that was introduced in the paper “The Building Blocks of Interpretability” last year.
Conceptually, feature visualization is a thread of research that tries to answer this question by letting us “see through the eyes” of the network.
It began with research into visualizing individual neurons and trying to determine what they respond to.
Because neurons don’t work in isolation, this led to applying feature visualization to simple combinations of neurons.
Applying feature activations to groups of neurons poses the challenges of identifying which neurons the method needs to be applied to.
The obvious answer seems to be to study the neurons that activate for a given input.
However, this approach has the limitation that it doesn’t provide a complete vision of the network but rather only the portion that activates for a particular input.
Let’s try to explain this using a basic analogy based on the human brain.
Imagine that we are trying to understand what regions of the neocortex activate when reading different words.
The equivalent of feature visualization in this scenario will be to study the neural activation for the different letters in the alphabet.
While that information is still relevant, it doesn’t offer a complete picture as those letters can be combined in multiple ways to form different words that will cause different interconnected neurons to activate.
Activation Atlas builds on the principles of feature visualization but extends it to provide a global view of the network.
Instead of focusing on the activations triggered by an input image, activation atlases provides visualizations of common combinations of neurons.
In our examples of word recognition, the activation atlases will show the activations for common combinations of words providing a more comprehensive view of how know knowledge is created in the network.
From the technical standpoint, an activation atlas is built by collecting the internal activations from each of these layers of our neural network from one million images.
These activations, represented by a complex set of high-dimensional vectors, is projected into useful 2D layouts via UMAP, a dimensionality-reduction technique that preserving some of the local structure of the original high-dimensional space.
The following figure illustrates the difference between feature visualization for individual neurons and activation atlases.
To test the ideas of activation atlases, Google and OpenAI created a convolutional neural network(CNN) called InceptionV1.
The architecture consists of a number of layers, which we refer to as “mixed3a”, “mixed3b”, “mixed4a”, etc.
, and sometimes shortened to just “3a”.
Each layer successively builds on the previous layers.
To apply activation atlases to InceptionV1, the first step is to feed the image into the network and run it through to the layer of interest.
The framework collects the number of activations.
If a neuron is excited by what it is shown, it’s activation value will be positive.
The results are shown in the following figure:When using a single image the benefits of activation atlases are not immediately obvious compared to some of its predecessors.
One of the major contributions of activation atlases is that it can be seamlessly applied to datasets of millions of images.
To validate that, Google and OpenAI tested InceptionV1 with a randomized dataset of one million images.
In this process, the model collects one random spatial activation per image which are then fed through UMAP to reduce them to two dimensions.
They are then plotted, with similar activations placed near each other.
Finally, the model draws a grid and average the activations that fall within a cell and run feature inversion on the averaged activation.
The whole process is illustrated in the following figure:To test activation atlases on different image classification models, Google and OpenAI published a very compelling demo.
For each neural network you can literally see the interpretation of the model.
Additionally, the code can also be used directly in different Jupyter notebooks.
Activation atlases is one of the most creative work I’ve seen in terms of neural network interpretability.
By providing visibility across the entire network, activation atlases provides unique visibility into the evolving knowledge building process of neural networks and provides a clean mechanism to “look inside the black box”.