New Deep Learning research from MIT suggests so!Their recently released research titled How to make a pizza:Learning a compositional layer-based GAN model explores how a GAN model can be trained to recognise the steps involved in making a pizza.
Their PizzaGAN comes in 2 parts:(1) Given an input image of a pizza, PizzaGAN is trained to predict what toppings the pizza has on it(2) Given an input image of a pizza, PizzaGAN can apply an ordered set of models to the image, where each model adds or removes a topping from the pizzaWhat makes up a pizza?Before trying to train a deep neural network to make a pizza, we’ll first need to figure out how to make a pizza ourselves.
Like any great recipe, the process of making a pizza is comprised of a set of ordered steps.
You always start with the dough, sauce, and cheese, and then move on to adding other more adventurous toppings.
This sequential process is reflected in how the pizza looks at each step of the way — its visual appearance changes with each added topping.
How PizzaGAN defines a pizza — as a set of ordered steps.
Once our target process is well-defined, we can begin to train an actual model that can approximate each of these steps.
For example, let’s say that we start out with a good’ol pepperoni pizza.
Our friend then comes up to us and says “hey, let’s add olives!” We can model the process of going from our original pizza to our new one as a series of steps:(1) Recognise our current state — pepperoni pizza(2) Apply a change that gets us to our target state — add olivesAfter adding the olives, another friend might say: “I don’t like pepperoni, let’s use ham!” This time we have 3 steps:(1) Recognise our current state — pepperoni and olives pizza(2) Apply the first change that gets us closer to our target state — remove pepperoni(3) Apply the second change that gets us to our target state — add hamTo learn how to build pizzas, the PizzaGAN neural network attempts to model all of these steps.
How a GAN can make a pizzaDatasetThe pizza dataset used to train PizzaGAN is composed of 9,213 images, each showing a single pizza.
Each image has a set of corresponding labels which describe the toppings that the pizza has on it, excluding the dough, sauce, and base cheese.
For example, if the pizza image has ham and mushrooms on it, the labels of that image are:["ham", "mushrooms"]When performing the training, the output classifications are one-hot encoded.
Thus, with a ham and mushrooms pizza, the ham and mushrooms elements of the output vector are set to 1.
0 while the rest of the elements are set to 0.
Generator network — adding and removing toppingsRecall that we want to be able to model the building of our pizza as a set of sequential steps.
Thus, whatever network is trained must be able to perform a single step at a time — add one topping, remove one topping, cook the pizza, etc.
To that end, a generator network is trained to model the adding or removing of each topping.
Given an input image of a pizza, the generator predicts an output image of a pizza as if we added or removed one topping.
Since the generator is trained for one topping at a time, and for only either adding or removing, multiple generator networks are trained, two for every pair of different topping sets (one for adding and one for removing in each pair).
An example of a pair of PizzaGAN generators — one to add pepperoni and one to remove it are shown below.
An example of a pair of PizzaGAN generators — one to add pepperoni and one to remove itThe cheese pizza has a 0 for it’s entire classification vector while the pepperoni pizza has all 0s except for the pepperoni index, which is 1.
Since the difference between the input and output images of a PizzaGAN generator is always only one topping, it follows that the difference of the sum of the classification vector elements of the input and output label vectors is also 1.
Discriminator — recognising pizzasThe PizzaGAN generators cover all of the adding and removing of toppings on the pizza.
The discriminator will take care of recognising what toppings are actually on the pizza currently.
Given an input image of a pizza, the discriminator networks predicts a set of multi-label classifications.
Each element of the output vector corresponds to a particular topping.
For example, in the figure below a PizzaGAN discriminator predicted that the image of the pizza had pepperoni, mushrooms, and olives .
The elements of the output vector corresponding to those toppings were predicated as 1.
0 at inference (or some value above the user set threshold).
An example of how the discriminator of PizzaGAN works, predicting both the classification and the topping ordering of the pizzaGAN models are usually trained by performing the training of the generator and discriminator together.
The discriminator model is trained with some of the outputs of the generator model and the loss of the discriminator model from it’s predictions is used in the training of the generator model.
PizzaGAN also follows this training scheme.
In addition to predicting the labels of the pizza image, the discriminator also predicts whether the image is real or comes from a generator.
This helps the generator create images that still look like real pizza images and to have all of the correct toppings.
Resulting PizzasWith the discriminator predicting the toppings on the pizza and the generators having the ability to add and remove toppings, PizzaGAN is able to build and decompose images of pizzas with pretty strong accuracy.
PizzaGAN adding and removing toppingsPizzaGAN cooking and uncooking toppingsIf you’d like to read some more details about how PizzaGAN works, I’d recommend checking out the original paper, published at CVPR 2019!Beyond that, I leave you with this wonderful quote from the paper:Pizza is the most photographed food on Instagram with over 38 million posts using the hashtag #pizza.
Like to learn?Follow me on twitter where I post all about the latest and greatest AI, Technology, and Science!.Connect with me on LinkedIn too!Recommended ReadingWant to learn more about Deep Learning?.The Deep Learning with Python book will teach you how to do real Deep Learning with the easiest Python library ever: Keras!And just a heads up, I support this blog with Amazon affiliate links to great books, because sharing great books helps everyone!.As an Amazon Associate I earn from qualifying purchases.
.. More details