By default in Keras, convolutional layers are initialized following a a Glorot Uniform distribution:So what’s happening if now we change the initialization to the Kaiming Uniform one ?Using Kaiming InitializationLet’s recreate our VGG16 model but this time we change the initialization to he_uniform.
Let’s now check the activations and gradients before training our model.
So now, with Kaiming initialization, our activations have a mean around 0.
5 and a standard deviation of around 0.
8We can see that now we have some gradients, which is a good thing if we want our network to learn something.
Now, if we train our new model, we get those curves:We probably need to add some regularization now but hey, that’s still better than before, right ?ConclusionIn this post, we showed that initialization can be a VERY important part of your model which can often be overlooked.
Also, it showed that what you have by default in libraries, even for excellent ones like Keras, are not to take for granted.
I hope that this blogpost helped you !.You probably won’t forget to correctly initialize your networks anymore !.Feel free to give me feed back or ask me questions is something is not clear enough.
References and further readings: The Kaiming He Initialization paper: The Xavier Glorot Initialization paper: Andrew Ng Init Lesson.