An End to End Introduction to GANsEasy Peasy Lemon SqueezyRahul AgarwalBlockedUnblockFollowFollowingJun 15I bet most of us have seen a lot of AI-generated people faces in recent times, be it in papers or blogs.
We have reached a stage where it is becoming increasingly difficult to distinguish between actual human faces and faces that are generated by Artificial Intelligence.
In this post, I will help the reader to understand how they can create and build such applications on their own.
I will try to keep this post as intuitive as possible for starters while not dumbing it down too much.
This post is about understanding how GANs work.
Task OverviewI will work on creating our own anime characters using anime characters dataset.
The DC-GAN flavor of GANs which I will use here is widely applicable not only to generate Faces or new anime characters; it can also be used to create modern fashion styles, for general content creation and sometimes for data augmentation purposes as well.
As per my view, GANs will change the way video games and special effects are generated.
The approach could create realistic textures or characters on demand.
You can find the full code for this chapter in the Github Repository.
I have also uploaded the code to Google Colab so that you can try it yourself.
Using DCGAN architecture to generate anime imagesAs always before we get into the coding, it helps to delve a little bit into the theory.
The main idea of DC-GAN’s stemmed from the paper UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS written in 2016 by Alec Radford, Luke Metz, and Soumith Chintala.
Although I am going to explain the paper in the next few sections, do take a look at it.
It is an excellent paper.
INTUITION: Brief Intro to GANs for Generating Fake ImagesTypically, GANs employ two dueling neural networks to train a computer to learn the nature of a data set well enough to generate convincing fakes.
We can think of this as two systems where one Neural Network works to generate fakes (Generator), and another neural network (Discriminator) tries to classify which image is a fake.
As both generator and discriminator networks do this repetitively, the networks eventually get better at their respective tasks.
Think of this as simple as swordplay.
Two noobs start sparring with each other.
After a while, both become better at swordplay.
Or you could think of this as a robber(generator) and a policeman(Discriminator).
After a lot of thefts, the robber becomes better at thieving while the policeman gets better at catching the robber.
In an ideal world.
The Losses in these neural networks are primarily a function of how the other network performs:Discriminator network loss is a function of generator network quality- Loss is high for the discriminator if it gets fooled by the generator’s fake imagesGenerator network loss is a function of discriminator network quality — Loss is high if the generator is not able to fool the discriminator.
In the training phase, we train our Discriminator and Generator networks sequentially intending to improve both the Discriminator and Generator performance.
The objective is to end up with weights that help Generators to generate realistic looking images.
In the end, we can use the Generator Neural network to generate fake images from Random Noise.
Generator architectureOne of the main problems we face with GANs is that the training is not very stable.
Thus we have to come up with a Generator architecture that solves our problem and also results in stable training.
The preceding diagram is taken from the paper, which explains the DC-GAN generator architecture.
It might look a little bit confusing.
Essentially we can think of a generator Neural Network as a black box which takes as input a 100 sized normally generated vector of numbers and gives us an image:How do we get such an architecture?In the below architecture, we use a dense layer of size 4x4x1024 to create a dense vector out of this 100-d vector.
Then, we reshape this dense vector in the shape of an image of 4×4 with 1024 filters, as shown in the following figure:We don’t have to worry about any weights right now as the network itself will learn those while training.
Once we have the 1024 4×4 maps, we do upsampling using a series of Transposed convolutions, which after each operation doubles the size of the image and halves the number of maps.
In the last step, though we don’t half the number of maps but reduce it to 3 channels/maps only for each RGB channel since we need three channels for the output image.
Now, What are Transpose convolutions?In most simple terms, transpose convolutions provide us with a way to upsample images.
While in the convolution operation we try to go from a 4×4 image to a 2×2 image, in Transpose convolutions, we convolve from 2×2 to 4×4 as shown in the following figure:Upsampling a 2×2 image to 4×4 imageQ: We know that Un-pooling is popularly used for upsampling input feature maps in the convolutional neural network (CNN).
Why don’t we use Un-pooling?It is because un-pooling does not involve any learning.
However, transposed convolution is learnable, and that is why we prefer transposed convolutions to un-pooling.
Their parameters can be learned by the generator as we will see in some time.
Discriminator architectureNow, as we have understood the generator architecture, here is the discriminator as a black box.
In practice, it contains a series of convolutional layers and a dense layer at the end to predict if an image is fake or not as shown in the following figure:Takes an image as input and predicts if it is real/fake.
Every image conv net ever.
Data preprocessing and visualizationThe first thing we want to do is to look at some of the images in the dataset.
The following are the python commands to visualize some of the images from the dataset:The resultant output is as follows:We get to see the sizes of the images and the images themselves.
We also need functions to preprocess the images to a standard size of 64x64x3, in this particular case, before proceeding further with our training.
We will also need to normalize the image pixels before we use it to train our GAN.
You can see the code it is well commented.
As you will see, we will be using the preceding defined functions in the training part of our code.
Implementation of DCGANThis is the part where we define our DCGAN.
We will be defining our noise generator function, Generator architecture, and Discriminator architecture.
Generating noise vector for GeneratorKids: Normal Noise generatorsThe following code block is a helper function to create a noise vector of predefined length for a Generator.
It will generate the noise which we want to convert to an image using our generator architecture.
We use a normal distributionto generate the noise vector:Generator architectureThe Generator is the most crucial part of the GAN.
Here, I create a generator by adding some transposed convolution layers to upsample the noise vector to an image.
As you will notice, this generator architecture is not the same as given in the Original DC-GAN paper.
I needed to make some architectural changes to fit our data better, so I added a convolution layer in the middle and removed all dense layers from the generator architecture, making it fully convolutional.
I also use a lot of Batchnorm layers with a momentum of 0.
5 and leaky ReLU activation.
I use Adam optimizer with β=0.
The following code block is the function I will use to create the generator:You can plot the final generator model:plot_model(generator, to_file='gen_plot.
png', show_shapes=True, show_layer_names=True)Generator ArchitectureDiscriminator architectureHere is the discriminator architecture where I use a series of convolutional layers and a dense layer at the end to predict if an image is fake or not.
Here is the architecture of the discriminator:plot_model(discriminator, to_file='dis_plot.
png', show_shapes=True, show_layer_names=True)Discriminator ArchitectureTrainingUnderstanding how the training works in GAN is essential.
And maybe a little interesting too.
I start by creating our discriminator and generator using the functions defined in the previous section:discriminator = get_disc_normal(image_shape)generator = get_gen_normal(noise_shape)The generator and discriminator are then combined to create the final GAN.
trainable = False# Optimizer for the GANopt = Adam(lr=0.
5) #same as generator# Input to the generatorgen_inp = Input(shape=noise_shape)GAN_inp = generator(gen_inp)GAN_opt = discriminator(GAN_inp)# Final GANgan = Model(input = gen_inp, output = GAN_opt)gan.
compile(loss = 'binary_crossentropy', optimizer = opt, metrics=['accuracy'])plot_model(gan, to_file='gan_plot.
png', show_shapes=True, show_layer_names=True)This is the architecture of our whole GAN:The Training LoopThis is the main region where we need to understand how the blocks we have created until now assemble and work together to work as one.
Don’t worry, I will try to break the above code step by step here.
The main steps in every training iteration are:Step 1: Sample a batch of normalized images from the dataset directory# Use a fixed noise vector to see how the GAN Images transition through time on a fixed noise.
fixed_noise = gen_noise(16,noise_shape)# To keep Track of lossesavg_disc_fake_loss = avg_disc_real_loss = avg_GAN_loss = # We will run for num_steps iterationsfor step in range(num_steps): tot_step = step print("Begin step: ", tot_step) # to keep track of time per step step_begin_time = time.
time() # sample a batch of normalized images from the dataset real_data_X = sample_from_dataset(batch_size, image_shape, data_dir=data_dir)Step2: Generate noise for input to the generator# Generate noise to send as input to the generator noise = gen_noise(batch_size,noise_shape)Step3: Generate images using random noise using the generator.
# Use generator to create(predict) images fake_data_X = generator.
predict(noise) # Save predicted images from the generator every 100th step if (tot_step % 100) == 0: step_num = str(tot_step).
png")Step 4: Train discriminator using generator images(Fake images) and real normalized images(Real Images) and their noisy labels.
# Create the labels for real and fake data.
We don't give exact ones and zeros but add a small amount of noise.
This is an important GAN training trick real_data_Y = np.
ones(batch_size) – np.
2 fake_data_Y = np.
2 # train the discriminator using data and labelsdiscriminator.
trainable = True generator.
trainable = False# Training Discriminator seperately on real data dis_metrics_real = discriminator.
train_on_batch(real_data_X,real_data_Y) # training Discriminator seperately on fake data dis_metrics_fake = discriminator.
train_on_batch(fake_data_X,fake_data_Y) print("Disc: real loss: %f fake loss: %f" % (dis_metrics_real, dis_metrics_fake)) # Save the losses to plot later avg_disc_fake_loss.
append(dis_metrics_real)Step 5: Train the GAN using noise as X and 1's(noisy) as Y while keeping discriminator as untrainable.
# Train the generator using a random vector of noise and its labels (1's with noise) generator.
trainable = True discriminator.
trainable = FalseGAN_X = gen_noise(batch_size,noise_shape) GAN_Y = real_data_Y gan_metrics = gan.
train_on_batch(GAN_X,GAN_Y) print("GAN loss: %f" % (gan_metrics))We repeat the steps using the for loop to end up with a good discriminator and generator.
ResultsThe final output image looks like the following.
As we can see, the GAN can generate pretty good images for our content editor friends to work with.
They might be a little crude for your liking, but still, this project was a starter for our GAN journey.
Loss over the training periodHere is the graph generated for the losses.
We can see that the GAN Loss is decreasing on average and the variance is decreasing too as we do more steps.
One might want to train for even more iterations to get better results.
Image generated at every 1500 stepsYou can see the output and running code in Colab:Given below is the code to generate some images at different training steps.
As we can see, as the number of steps increases the images are getting better.
Given below is the result of the GAN at different time steps:ConclusionPower in your handsIn this post, we learned about the basics of GAN.
We also learned about the Generator and Discriminator architecture for DC-GANs, and we built a simple DC-GAN to generate anime images from scratch.
This model is not very good at generating fake images, yet we get to understand the basics of GANs with this project, and we are fired up to build more exciting and complex GANs as we go forward.
The DC-GAN flavor of GANs is widely applicable not only to generate Faces or new anime characters, but it can also be used to generate new fashion styles, for general content creation and sometimes for data augmentation purposes as well.
We can now conjure up realistic textures or characters on demand if we have the training data at hand, and that is no small feat.
If you want to know more about deep learning applications and use cases, take a look at the Sequence Models course in the Deep Learning Specialization by Andrew NG.
Andrew is a great instructor, and this course is great too.
I am going to be writing more of such posts in the future too.
Let me know what you think about the series.
Follow me up at Medium or Subscribe to my blog to be informed about them.
As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz.