This results in the subsequent equation for convolutional layer activations b:where ϵj ∼ N(0, 1), Ai is the receptive field, ∗ signalises the convolutional operation, and ʘ the component-wise multiplication.Applying two convolutional operations for mean and varianceThe crux of equipping a CNN with probability distributions over weights instead of single point-estimates and being able to update the variational posterior probability distribution q by backpropagation lies in applying two convolutional operations whereas filters with single point-estimates apply one..Since the output b is a function of mean μ and variance αμ² among others, we are then able to compute these two variables determining a Gaussian probability distribution separately.We do this in two convolutional operations: in the first, we treat the output b as an output of a CNN updated by frequentist inference..We optimise with Adam towards a single point-estimate which makes the validation accuracy of classifications increasing..We interpret this single point-estimate as the mean μ of the variational posterior probability distributions q..In the second convolutional operation, we learn the variance αμ²..As this formulation of the variance includes the mean μ, only α needs to be learned here..In this way, we ensure that only one parameter is updated per convolutional operation, exactly how it would have been with a CNN updated by frequentist inference.In other words, while we learn in the first convolutional operation the maximum-a-posteriori (MAP) of the variational posterior probability distribution q, we observe in the second convolutional operation how much values for weights w deviate from this MAP..This procedure is repeated in the fully-connected layers.Experiments with Bayesian CNNsLet’s have this time not only theoretical explanations, but also look at some examples..As I mentioned earlier, Gal & Ghahramani (2015) used Dropout to approximate the intractable posterior probability distribution q and spoke then of a Bayesian CNN..Despite the methodological deficiencies, the results perform comparable to, for CIFAR-10 even better than ours..We used for the results in the table below LeNet-5 and AlexNet and compared results achieved by frequentist and Bayesian inference.Next, we show how Bayesian CNNs incorporate naturally a regularisation effect..This phenomenon might be called model averaging in other literature.. More details