1st Place Solution for Intel Scene Classification ChallengeHosted by Analytics VidhyaAfzal SayedBlockedUnblockFollowFollowingJun 2IntroductionProblemYou are provided with a dataset of ~25k images from a wide range of natural scenes from all around the world.
Your task is to identify which kind of scene can the image be categorized into.
Data ClassesApproachBuilding and training a Convolutional Neural Network that can classify above mentioned categories of images correctly.
Language And FrameworksTo be able to quickly experiment through various models, I chose Python as a language and FastAI and PyTorch as the DL framework.
Transfer Learning with Progressive Image ResizingHaving taken the FastAI online course about 2 months ago I learnt a few important tips and tricks that would help in training models with the help of limited data and reach high accuracy quickly.
Transfer Learning and Progressive Image Resizing were two of these very useful techniques.
A great example of which is this cifar-10 notebook by FastAI https://github.
ipynbI used built-in resnet50 architecture from FastAI library with pretrained ImageNet weights to train a model on progressively increasing image sizes of 32×32, 64×64, 128×128 and 224×224.
Default Image transformation, Normalization and Learning Rate Optimizer were applied.
Many intermediate submissions were generated between running multiple epochs out of which the best test accuracy I could get with this technique was 0.
946575342465753First EnsembleAt this point I had 20 odd submissions which I ensembled using a simple voting mechanism, a common employed technique in Data Science competitions and my test accuracy increased to 0.
958904109589041Mixup AugmentationFastAI has a built-in callback for a new technique called Mixup Augmentation https://docs.
htmlThis technique didn’t show any significant improvement in itself but submission generated by it was included in the ensembling of submissions.
LR TuningWith the help of lr_find function, FastAI provides a way to get an optimum learning rate for training a model.
This helped in setting learning rate for the next epochs before Progressive Resizing step.
html#lr_findPlaces365 Dataset WeightsFor my next experiment, instead of using ImageNet pretrained weights I started using freely available Places365 dataset weights for resnet50 architecture for transfer learning: http://places2.
tarApplying the same technique as before i.
progressive resizing and default transformations I trained few epochs on this model and tried submitting all intermediate models with and without mixup augmentation.
Test Accuracy: 0.
953424657534247Second EnsembleSimilar to before now ensembling all previous submissions gave test accuracy of 0.
958904109589041kNN With EmbeddingsAfter searching for inspiration I stumbled upon the solution of Google Landmark Recognition Challenge Winners (4th place) that had an interesting idea of predicting the class using kNN of embeddings or feature vector representation of images which is output of last layer of our model before softmax.
They called it few shot learning: https://www.
com/c/landmark-recognition-challenge/discussion/57896I experimented with k=5, k=10, k=50, k=100, k=500 and used voting as well as average weighted distance to identify which class an image belongs to.
This technique didn’t give a direct improvement in accuracy but it contributed in the ensembling of submissions in the future.
Final EnsembleThis was my final submission which was a simple ensemble of all the submission that scored above 0.
95 in test accuracy giving me a final score of 0.
964840182648402 on public leaderboard.
Data PreprocessingImage TransformationsI used the basic transformations which come as defaults with FastAI’s get_transform function.
More on this here: https://docs.
htmlNormalizationImages were normalized during training with built-in normalize functionhttps://docs.
html#Data-normalizationTrain Validation SplitI used the default train validation split in fastai which is 0.
html#ImageDataBunchRemoving Confusing ImagesOne of the largest boosts in accuracy was achieved by removing images which might be mislabeled or confusing to the model.
For this I loaded a previously trained resnet50 model and ran a prediction on the entire training dataset.
I chose to remove images with wrong predictions having confidence less than 0.
55 as well as more than 0.
The ones with lower accuracy represent images having two or more classes present at the same time (like buildings and street, or mountains and glaciers).
Hence, the confidence of the highest probable class is relatively lower.
Removing images having a wrong prediction with confidence greater than 0.
999999 shows the case of blatant misclassification on images with high confidence which might be due to mislabeling of training samples.
This yielded a test accuracy of 0.
963013698630137 on public leaderboard.
Test Time AugmentationUsing the TTA function in FastAI – https://docs.
html#Test-time-augmentation, not only improved the test accuracy of the above model to 0.
963470319634703 but also contributed significantly in the ensemble of submissions.
Hence it turned out useful to generate submissions both with and without TTA in the final ensemble.
ResultsPublic Leaderboard Rank: 14, Test Accuracy: 0.
9648401826Private Leaderboard Rank: 1, Test Accuracy: 0.
com/afzalsayed96/intel_scene_classificationKey TakeawaysA great library like FastAI with sanely optimized defaults helpsTry to find pretrained weights of a similar dataset relevant to the problemTrust your solution and don’t try to overfit to get up on public leaderboard.
Ensembling your best submissions help especially if you have variation such as mixup, kNN, TTA, etc.
Follow basic hygiene and best practices like setting seeds, journaling models, etc to get reproducibility on models.
CreditsThanks to Soumendra P for providing his valuable mentorship and guidance during the contest.
.. More details