VAEs! Generating images with Tensorflow

We have created a complete network with its corresponding data input pipeline with just a few lines of code.

We can move now, test our network and check if it can properly learn and create new amazing images!Experiments1.

Are our networks learning?Let’s start with a simple example and check that the network is working as it should.

For this we will use the MNIST and fashion MNIST datasets and see how is the network reconstructing our input images after a few epochs.

For this I will set the number of latent dimensions equal to 5.

Below, we can see how our network learns to transform the input image in the latent space of 5 dimensions, to recover them then to the original space.

Note how as more epochs have passed, we obtain better results and a better loss.

Image recovering vs epoch number for MNIST and Fashion MNIST datasetsWe see how our network keeps improving the quality of the recovered images.

It’s interesting to see how at the initial epochs numbers 6, 3, 4 are converted into something that we can considered as ‘similar’ to a 9.

But, after a few iterations the original input shape is conserved.

For the fashion dataset we get a similar behavior with the bag pictures that are initially recovered as a boot!It’s also curious, to observe how the images change to get an intuition of what is happening during the learning.

But what is going on with the image encodings?2.

How does our latent space look?In the previous section we used a 5 dimensional latent space but we can reduce this number to a two dimensional space which we can plot.

In this way the complete images will be encoded in a 2D vector.

Using a scatter plot, we can see how this dimensional space evolves with the number of epochs.

In a first moment all images are close to the prior (all point are located around 0).

But during training the encoder learns to approximate the posterior distribution, so it will locate the latent variables in different parts of the space having into account their labels (equal label -> close region in the space).

Let’s have a look first and then discuss more about this!MNIST latent space evolution during 20 epochsIsn’t this cool?.Numbers that are similar are placed in a similar region of the space.

For example we can see that zeros (red) and ones (orange) are easily recognized and located in a specific region of the space.

Nevertheless it seems the network can’t do the same thing with eights (purple) and threes (green).

Both of them occupy a very similar region.

Same thing happens with clothes.

Similar garments as shirts, t-shirt and even dresses (which can be seen as an intersection between shirts and trousers) will be located in similar regions and the same thing happens with boots, sneakers and sandals!Fashion MNIST latent space evolution during 20 epochsThese two dimensional plots can also help us to understand the KL divergence term.

In my previous post I explained where does it come from and also that acts as a penalization (or regularization term) over the encoder when this one outputs probabilities that are not following an unit norm.

Without this term the encoder can use the whole space and place equal labels in very different regions.

Imagine for example two images of a number 1.

Instead of being close one to each other in the space, the encoder could place them far away.

This would result in a problem to generate unseen samples, since the space will be large and it will have ‘holes’ of information, empty areas that do not correspond to any number and are related with noise.

Left: Latent space without KL reg — Right: Latent space with KL regLook the x and y axis.

On the left plot, there is not regularization, so points embrace a much larger region of the space, while as in the right image they are more concentrated, so this produces a dense space.


Generating samplesWe can generate random samples that belong to our latent space.

These points have not been used during training (they would correspond with a white space in previous plots).

Our network decoder though, has learnt to reconstruct valid images that are related with those points without seem them.

So let’s create a grid of points as the following one:Two dimensional grid of pointsEach of this points can be passed to the decoder which will return us a valid images.

With a few lines we can check how is our encoder doing through the training and evaluate the quality of the results.

Ideally, all labels would be represented in our grid.

Comparsion between grid space generated images and latent space distributionWhat about the fashion dataset?.Results are even more fun!.Look how the different garment are positioned by ‘similarity’ in the space.

Also the grid generated images look super real!After 50 epochs of training and using the grid technique and the fashion MNIST dataset we achieve this results:Fake images generated using mesh grid pointsAll this images here are fake.

We can finally see how our encoder works and how our latent space has been able to properly encode 2D image representations.

Observe how you can start with a sandal and interpolate points until you get a sneaker or even a boot!ConclusionWe have learnt about Variational Autoencoders.

We started with the theory and main assumptions that lie behind them and finally we implement this network using Google’s Tensorflow.

We use the dataset API, reading and transforming the data to train our model using an iterator.

We have also implemented our encoder and decoder networks in a few lines of code.

The cost function, made up of two different parts, the log likelihood and the KL divergence term (which acts as a regularization term), should be also clear now.

The experimental part was designed to support all the facts mentioned above so we could see with real images what is going on under all the computations.

It could be interesting to adapt our network to work with larger and colored images (3 channels) and observe how our network does with a dataset like that.

I might implement some of these features but this is all for now!mmeendez8/AutoencoderVAE implementation in tensorflow.

Contribute to mmeendez8/Autoencoder development by creating an account on GitHub.


comVAEs I !.Generating images with TensorflowGenerative models are one of the cooler branches of Deep Learning.

During last weeks Generative Adversarial Networks…medium.

comAny ideas for future posts or is there something you would like to comment?.Please feel free to reach out on Github or Linkedin!Miguel MendezAnalysis using Twitter publics API where we focus in the relations between politicians in the social network.



io.. More details

Leave a Reply