A different kind of (deep) learning: part 1

A different kind of (deep) learning: part 1An intro to self supervised learningGidi ShperberBlockedUnblockFollowFollowingDec 3Deep learning has truly reshuffled things in machine learning, and specifically in image recognition task..This may serve the readers for a few purposes:You may be interested in learning about works you didn’t know of.You may get new ideas for your own work.You may learn of relation between logical parts and tasks in deep learning that you were not aware of.The first part of this series will be about self supervised learning that was one of the main drivers for me to write this series.Self supervised learningImagine you have an agent that scours the web, and seamlessly learn from every image it encounters..Predicting the next/previous word is a prominent example, as done in word embeddings and language models tasks.In vision, doing such tricks is a bit more complex, since the vision data (images and videos) are not human created explicitly (well, some photographer may put certain amount of thought in his photography) but not every video, and definitely not every image has some kind of a logic structure that a signal can be extracted from.Isn’t it just another form of unsupervised learning?.I can’t promise that this one will bring the best achievements to deep learning, but it is definitely already brought some great creative ideas.Such tasks are called Self Supervised Learning..Unlike “weak annotations” which mean images with different tags, headers, or captions, self-supervised task considered no annotations but the image itself..In Lab colorspace, L stands for lightness (B&W intensity) and is used for predicting ab channels (a — green to red, b — blue to yellow).Colorization with Lab encoding, from [1]As we will see in all tasks we discuss, self supervised learning is not as straightforward as we got used to in deep learning..They’ve tried to overcome the ambiguity issue by predicting a color histogram, and sampling from it:This work, apart from using the LAB space, also tries to predict Hue/Chroma attributes, which is related to as “HSV” color space.ContextBesides the color prediction, the next most evident (but also creative) task is learning things about image structure..This means that in some cameras distribution of color varies in different parts of the image..This paper approached took the creative approach of predicting image rotation.Rotation prediction, apart from being creative, is relatively fast, and doesn’t require any kind of prepossessing as in other tasks we’ve seen before, to overcome learning of trivial features.Paper also explores some “Attention maps” which show their network focuses on the important parts of images: heads, eyes, etc.Although reporting state of the art results on transfer learning to imageNet classification (most other works relate to pascal), some flaws were found in the paper by reviewers, so it has to be taken with a grain of salt.GeneralizationSo after all this work, what do we get from it?.maybe, or maybe not, but I believe that exploring such different approaches significantly improves the deep learning field, and may indirectly positively influence the real breakthroughs.. More details

Leave a Reply