Generative Adversarial Networks: Revitalizing old video game texturesEdward BarnettBlockedUnblockFollowFollowingFeb 28A pixel art gif I generated with the model.
Not too shabby.
If there is one thing that people love to talk about in video games, it is the fidelity of their graphics.
A measure of gaming progress throughout the years has been the ability of modern day hardware to bring about the next wave of ‘new-gen graphics’.
But that does not mean that we do not still have a fondness for the games of yesteryear.
An overwhelming nostalgia, a deep love for the games of the past (and their graphics) resides within many of us.
How exciting is it then to be alive in a time where machine learning can be directly applied in a method called Single Image Super Resolution to enhance the textures of those past relics and bring them somewhat more into the modern world.
The model SRGAN I’ve used here is based off of this paper https://arxiv.
04802As well as an expansion of the original model presented at ECCV 2018 https://github.
com/xinntao/ESRGANBefore and after comparisons of Ocarina of Time texturesBefore we dig into the inner workings of the model, it’s important to understand what SISR is, and how it works.
It has many applications outside of the realm of pictures and game textures such as microscopy and radar, but for the sake of clarity, we will keep the context solely to the application of digital media.
A SISR model attempts to assimilate a high resolution image from a low resolution image, while keeping the fidelity of the image intact.
Meaning no image noise, no artifacts.
Until recent years, it was extremely difficult to scale images very much without having a great deal of the texture detail missing.
commonly super resolution algorithms would attempt to create a cost function to minimize the MSE(mean squared error) between the enhanced high resolution image and the base image.
This reduction of MSE is directly associated with a peak signal-to-noise ratio.
(PSNR) Unfortunately, this measurement is done from a pixel by pixel comparison and often times results in highly blurred images, such as bicubic image upscaling or nearest neighbor transformers.
SISR models usually achieve a higher fidelity than those resampling methods, although as the resolution increases, you begin to see a similar pattern of behaviour, or in some cases pixels begin to melt together.
Compare the SRGAN, to it’s enhanced future version below.
As you can see, the eyes begin to collapse a bit as we continue to scale upwards.
But this is certainly an improvement over simpler sampling methods.
So let’s talk about what exactly a GAN network is.
Break it down.
The generative part of the term comes from the fact that GAN’s seek to create content as the output of the algorithm.
The adversarial term alludes to the idea that the generative part of the algorithm needs something to compete with, an adversary.
The network refers to how the generative and adversarial models are tied together to ultimately cooperate towards the end goal of the model.
For images, GAN networks attempt to generate new images from their training set, and the adversarial model tries to determine if the generated image is real or fake.
Like SISR image-based GAN networks seek to minimize a cost function, although there are more wide spread measures of error here, such as Euclidean distance between feature mappings generated by the GAN networks.
SRGANSo now you might be able to guess that this is simply an acronym for super resolution generative adversarial network.
A GAN based approach to SISR.
SRGAN replaces the MSE cost minimization (remember, this measures on a pixel by pixel basis, and can leave us with overly smooth representations) with a new loss function calculated from feature maps of the VGG (visual geometry group) network.
In 2017, SRGAN was state of the art, best in class.
But the world stands still for no one.
Enter a new competitor….
ESRGAN: An introductionAnd some prerequisite knowledge to understand why it mattersESRGAN introduces Residual-in-Residual Dense Blocks to SRGAN.
Normally, when you have many complex layers in a model the higher the amount of layers, the less effective they become, because the sum of the minimization error decreases the closer you approach complete accuracy, and these layers at some point began competing with each other.
With traditional methods, you actually start to lose accuracy with more than 25 layers.
How is minimization calculated?Deep networks are notoriously difficult to parameterize during training thanks to the vanishing gradient problem.
The gradient is used to calculate the weights of earlier layers of the model.
This repetition often makes the gradient infinitely smaller and as you might imagine the deeper the network the more infinitely small the gradient.
We use the gradient to calculate the cost function.
Eventually this cost function will actually start to increase if we fail to introduce a method to handle the problem.
The introduction of residual connectionsIn 2015, ResNet would change the world with their implementation of residual networks.
They theorized that “identity shortcut connections” would allow the connected layers to fit a residual mapping.
Let H(x)be the desired underlying mapping.
We try to make the stacked non-linear layers fit another mapping F(x):=H(x)−xF(x):= H(x)−x.
The original mapping can be recast as F(x)+x.
They theorized that it is easier to optimise the residual mapping than to optimise the original mapping.
And they were right.
The identity shortcut connections seek to learn the identity function which is simply setting the weight matrix of the current layer’s computed weights and bias values of the previous layer (l+1) to 0.
By doing so activation of the new layer (l + 2) is the same as layer l.
Andrew Ng explains this much better than I, and with some drawn work throughs of the math, definitely worth taking a look at.
The result was a network that stopped the degradation of the gradient cost function and increased the performance compared to standard networks.
Amazing!Improving residual networksNot long after this, the community exploded with an influx of researchers seeking to optimize this new approach.
A pre-activation of the residual layers was proposed that would allow the gradient to use the shortcut connections to skip around to any layer without being impeded.
This improved the residual approach even more, bringing a 1001 layer network into the world that performed better than it’s more shallow cousins.
ESRGAN’s customization of SRGAN and it’s founding father ResNetIn addition to ESRGAN’s RRDB approach, they drop the batch normalization layers and substitute residual scaling and a minute initialized learning rate.
Then, instead of measuring whether an image is fake or not, it utilizes RaGAN, or Relativistic Average GAN, which seeks to measure the relevancy of one image to another, ie, is this image more realistic or less realistic.
SRGAN’s perceptual loss function uses the VGG features in it’s calculation after activation, ESRGAN moves this utilization before activation.
All this leads to sharper and more clear upscaling of images than ever before.
I’ve used this model and trained it on a collection of modern day textures and applied it to old N64 and PlayStation 1 textures, with marvelous results.
I hope to be able to edit this article and share them with the community, as I think it could make for some really cool modifications to the games, and give us old timers some new reasons to replay our favorites, but I’m still looking into if it’s legally “okay” to do so.
Till then, thanks for reading!.