Deep Transfer Learning for Natural Language Processing — Text Classification with Universal Embeddings

We’ve had some recent successes with word embeddings including methods like Word2Vec, GloVe and FastText, all of which I have covered in my article around ‘Feature Engineering for Text Data’.Understanding Feature Engineering (Part 4) — A hands-on intuitive approach to Deep Learning…Newer, advanced strategies for taming unstructured, textual datatowardsdatascience.comIn this article, we will be showcasing several state-of-the-art generic sentence embedding encoders, which tend to give surprisingly good performance especially on small amounts of data for transfer learning tasks as compared to word embedding models..Let’s clear up the basics first and cut through the hype.An embedding is a fixed-length vector typically used to encode and represent an entity (document, sentence, word, graph!)I’ve talked about the need for embeddings in the context of text data and NLP in one of my previous articles..The fact of the matter is, machine learning or deep learning models run on numbers and embeddings are they key to encoding text data to be used by these models.Text EmbeddingsA big trend here has been finding out so-called ‘Universal Embeddings’ which are basically pre-trained embeddings obtained from training deep learning models on a huge corpus..The following figure showcases some recent trends in Universal Word & Sentence Embeddings thanks to an amazing article from the folks over at HuggingFace!Recent Trends in Universal Word & Sentence Embeddings (Source: some interesting trends in the above figure including Google’s Universal Sentence Encoder which we will be exploring in detail in this article!.I definitely recommend readers to check out the article on universal embedding trends from HuggingFace.????The Current Best of Universal Word Embeddings and Sentence EmbeddingsWord & Sentence Embeddings have evolved really fast over the last few months – A brief primer on what happenedmedium.comNow, let’s take a brief look at trends and developments in word and sentence embedding models before diving deeper into Universal Sentence Encoder.Trends in Word Embedding ModelsThe word embedding models are perhaps some of the older and more mature models which have been developed starting with Word2Vec in 2013..The three most common models leveraging deep learning (unsupervised approaches) models based on embedding word vectors in a continuous vector space based on semantic and contextual similarity are:Word2VecGloVeFastTextThese models are based on the principle of distributional hypothesis in the field of distributional semantics, which tells us that words which occur and are used in the same context, are semantically similar to one another and have similar meanings (‘a word is characterized by the company it keeps’)..ELMo is a take on the famous muppet character of the same name from famed show ‘Sesame Street’, but actually is an acronym which stands for ‘Embeddings from Language Models’.Elmo from Sesame Street!Basically ELMo gives us word embeddings which are learnt from a deep bidirectional language model (biLM), which is typically pre-trained on a large text corpus, enabling transfer learning and for these embeddings to be used across different NLP tasks..Allen AI tells us that ELMo representations are contextual, deep and character-based which uses morphological clues to form representations even for OOV (out-of-vocabulary) tokens.Trends in Universal Sentence Embedding ModelsThe concept of sentence embeddings is not a very new concept, because back when word embeddings were built, one of the easiest ways to build a baseline sentence embedding model with averaging.A baseline sentence embedding model can be built by just averaging out the individual word embeddings for every sentencedocument (kind of similar to bag of words where we lose that inherent context and sequence of words in the sentence)..Viewing generation as choosing a sentence from all possible sentences, this can be seen as a discriminative approximation to the generation problem.InferSent is interestingly a supervised learning approach to learning universal sentence embeddings using natural language inference data..This is hardcore supervised transfer learning, where just like we get pre-trained models trained on the ImageNet dataset for computer vision, they have universal sentence representations trained using supervised data from the Stanford Natural Language Inference datasets..Once the sentence vectors are generated, 3 matching methods are applied to extract relations between u and v :Concatenation (u, v)Element-wise product u ∗ vAbsolute element-wise difference |u − v|The resulting vector is then fed into a 3-class classifier consisting of multiple fully connected layers culminating in a softmax layer.Universal Sentence Encoder from Google is one of the latest and best universal sentence embedding models which was published in early 2018!.The Universal Sentence Encoder encodes any body of text into 512-dimensional embeddings that can be used for a wide variety of NLP tasks including text classification, semantic similarity and clustering..It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks which require modeling the meaning of sequences of words than just individual words.Their key finding is that transfer learning using sentence embeddings tends to outperform word embedding level transfer.. More details

Leave a Reply