The data in the latent space is a distribution of the input data, which is then sampled from by a decoder network to generate a similar version of the input data.A VAE consists of an encoder, decoder, and a loss function.Structure of a VAEThe encoder is a neural network that takes the input data of 5000 genes and encodes it into just a 100 features.Snip-it of input data consisting of 5000 genes and their gene expression levels (this table goes on for a long time)Compressing the input dataThe encoded features are probability distributions representing only the relevant features of the input data..Since the VAE is a generative model, it’s goal is to generate variations similar to the input data..In order for the encoder to compress and represent the data in probabilistic terms, to fit in a minimized space, the encoder outputs the compressed data as two vectors: the mean vector and the standard deviation vector.Intuitively, the mean vector controls the range, what the encoding of the input data should be centred around, while the standard deviation controls the “area”, how much from the mean the encoding can vary.Now the decoder network samples from the mean and standard variation vectors to get an input, as vector format, to feed into the decoder network..The sampled vector is called the hidden layer..The decoder is now able to reconstruct the original input.But how do we make sure the output from the decoder network, matches the original input data fed into the encoder network?This is where the loss function comes in to the rescue!.The loss function is composed of 2 parts, a generative loss and a latent loss..The generative loss aids the decoder to generate data similar to the input, it helps it’s accuracy..It does this by taking the error difference between the data output of the decoder and the input of the encoder network..The error is then back propagated through both networks updating it’s weights and parameters to improve the accuracy of the decoder network..The latent loss tells how closely the encoded features in the mean and standard deviation vectors, match the original input data..This is an extremely important function in VAE’s as ultimately the encoded features are what is being sampled by the decoder to learn from and generate data similar to the input.Output generated data from decoder networkPart 2Identifying Biological SignalsI wanted to see if the encoded features in the VAE was able to recapitulate and preserve biological variance present in the gene data, such as the sex of a patient.To do this I extracted the weights in the first layer of the decoder network..The weights in the first layer of the decoder network, decode the hidden layer which consists of sampled information from the compressed input data.The weights used to decode the features in the hidden layer, were actually able to capture important and consistent biological patterns in the gene expression data.. More details