Doodling with Deep Learning!

Below is the custom CNN model that we created with number of convolutional layers, dense layers, dropout and size as the parameters while building the model.As we were still deciding on the best model to proceed further with the analysis, we used limited data at the initial step to lessen execution time.Selecting an OptimizerA key step before proceeding with training is deciding which optimizer to use..After referring to the literature and taking advice from experts on various Kaggle forums, we decided to compare the performance of the Adam optimizer and the SGD optimizer on our data..After multiple iterations, we chose the Adam optimizer because we observed that it showed slightly better results and converged faster than SGD.After running the model on 25000 images per class for about 3 hrs, we obtained a MAP@3 score of 0.76 on Kaggle’s Public Leaderboard — not a bad result for merely 25000 images per class!.To put this into perspective, the average class in the dataset contains 150000 images..However, when we increased the model complexity, the accuracy slightly degraded, which led us to our next step: ResNet.SE-ResNet-34, SE-ResNet-50:When increasing the depth of the model, it is likely to face issues like vanishing gradient and degradation; comparatively, deeper models perform worse than simpler ones..A Residual Network, or ResNet is a neural network architecture that solves the problem of vanishing gradients and degradation issue in the simplest way possible by using deep residual learning.In simple words, during back propagation, when the signal is sent backwards, the gradient always must pass through f(x) (where f(x) is our convolution, matrix multiplication, or batch normalization, etc), which can cause trouble due to the non-linearities which are involved.The “+ x” at the end is the shortcut..It allows the gradient to pass backwards directly..By stacking these layers, the gradient could theoretically “skip” over all the intermediate layers and reach the bottom without being diminished.You can refer to the original paper to further understand the comparisons between a 34 layer plain network and and a 34 layer residual network.In this step of the process, we trained SE-ResNet-34 and 50 as a step further from simple CNN..The term SE refers to Squeeze and Excitation Net; with it, an additional block gives weights to different channels..The SE blocks were proven to provide additional accuracy by giving the weights yet merely increased less than 10% of total parameters..More information on Squeeze & Excitation Nets can be found here.While training of SE-ResNet-50, we tried different parameters as the following for 50 to 60 epochs.Finally, out of all the combinations, the batch size of 512 and image size of 128×128 gave the best improvement to the score, boosting it to 0.9093..It is important to note that changing the batch size and image size is based on the GPU that we were using and that these were the maximum possible parameters on Tesla k80 for our data.MobileNetAfter multiple iterations with SE-ResNet and with the competition deadline quickly approaching, we decided to explore MobileNet, which gave comparable accuracy yet executed much more quickly.MobileNet was introduced by Google to enable the delivery of the latest technologies such as object, logo, and text recognition for customers anytime, anywhere, irrespective of Internet connection.. More details

Leave a Reply