Progressively-Growing GANsConnor ShortenBlockedUnblockFollowFollowingFeb 17The Progressively-Growing GAN architecture released from NVIDIA and published at ICLR 2018 has become the primary display of impressive GAN image synthesis.
Classically, GANs have struggled to output low- and mid- resolution images such as 32² (CIFAR-10) and 128² (ImageNet), but this GAN model was able to generate high-resolution facial images at 1024².
1024 x 1024 facial images generated with the Progressively-Growing GAN architectureThis article will explain the mechanisms discussed in the paper for building Progressively-Growing GANs, these include multi-scale architectures, linearly fading in new layers, mini-batch standard deviation, and equalized learning rate.
A link to the paper is provided below:Progressive Growing of GANs for Improved Quality, Stability, and VariationWe describe a new training methodology for generative adversarial networks.
The key idea is to grow both the generator…arxiv.
orgMulti-Scale ArchitectureDiagram of the Multi-Scale Architecture used in Progressively-Growing GANsThe diagram above shows the concept of the Multi-Scale Architecture.
The ‘real’ images that the discriminator uses to determine if generated outputs are ‘real’ or ‘fake’ are downsampled to resolutions such as 4², 8², and so on up to 1024².
The generator first produces 4² images until this reaches some kind of convergence, and then the task increases to 8² images up to 1024².
This strategy greatly stabilizes the training, and it is fairly intuitive to imagine why.
Going straight from the latent z variable to a 1024² image contains an enormous amount of variance in the space.
As has been the trend in previous GAN research, generating low-resolution images such as 28² grayscale MNIST images is much easier than the 128² RGB ImageNet images.
The next interesting detail to the Progressively-Growing GAN model is to understand exactly how the model transitions to higher resolutions.
Fading in New LayersDiagram depicting how new layers are added to progressive the target resolution from low to high resolutionIf you are familiar with ResNets, this will be an easy concept to understand because it is much simpler than that.
For this explanation, please observe image (b) in the diagram above, specifically in the G portion above the dotted line.
When the new 32×32 output layer is added to the network, the 16×16 layer’s output is projected into the 32×32 dimension with a simple nearest neighbor interpolation.
This is a very important detail to understand.
The projected (16×16 →32×32 via nearest neighbor interpolation) layer is multiplied by 1-alpha and concatenated with the new output layer (32×32) multiplied by alpha to form the new 32×32 generated image.
The alpha parameter linearly scales from 0 to 1.
When the alpha parameter reaches 1, the nearest neighbor interpolation from the 16×16 is completely nulled out, (e.
1–1 = 0, the features maps * 0 = 0.
This smooth transition mechanism greatly stabilizes the Progressively-Growing GAN architecture.
The following diagram below shows how the final architecture looks after it has completed the progressive-growing:The two ideas presented already, Multi-Scale Architecture and Fading in New Layers are the foundational ideas of this paper, the following topics are slightly more advanced, but are still very important for achieving the final results of the paper.
Please leave a comment if you find an issue in the description of these concepts and can expand on them:Minibatch Standard DeviationThis idea is related to the lack of variation evident in many GAN models.
This problem stems from the same root as “mode collapse”.
In a famous GAN paper from Salimans et al.
, they introduce mini-batch discrimination.
Mini-batch discrimination concatenates an extra feature map onto the discriminator that is made of the feature statistics across all images in a batch.
This pushes the batch of generated samples to share similar features to the batch of real samples, otherwise the mini-batch feature layer will easily expose the generated samples as being fake.
Progressively-Growing GANs modifies this idea by adding a simpler, constant feature map.
This constant feature map is derived from the standard deviation of all features in a batch across spatial locations.
This final constant map is similarly inserted towards the end of the discriminator.
Equalized Learning RateThe idea behind equalized learning rate is to scale the weights at each layer with a constant such that the updated weight w’ is scaled to be w’ = w /c, where c is a constant at each layer.
This is done during training to keep the weights in the network at a similar scale during training.
This approach is unique because usually modern optimizers such as RMSProp and Adam use the standard deviation of the gradient to normalize it.
This is problematic in the case where the weight is very large or small, in which case the standard deviation is an insufficient normalizer.
Applying Progressive-Growing, Fading in New Layers, Mini-batch Standard Deviation, Equalized learning rate, and another concept not discussed in this article, Pixel-wise normalization enabled very impressive and high resolution GAN results:LSUN Images generated at 256×256 with the progressively-growing GAN architecture, amazing detail!Comparison of Progressively-Growing GANs (Far Left) with Mao et al.
(Least Squares GANs, Far Left), and Gulrajani et al.
(Improved Wasserstein GAN, middle) on the LSUN interior bedroom imagesThank you for reading this article!.Hopefully this helped you to gain some understanding of the Progressively-Growing GAN model.
Check out the paper and leave a comment if you have any thoughts on this!.