Learning generic sentence representation by various NLP tasksIntroduction to General Purpose SentenceEdward MaBlockedUnblockFollowFollowingFeb 16There are a lot of papers introduced different way to compute a better text representation.
From character embeddings, word embeddings to sentence embeddings, I introduced different way to deal with it.
However, most of them are training by single data set or problem domain.
Subramania et al.
explored a simple, effective multi-task learning approach for sentence embeddings in this time.
The hypothesize is that good sentence representations can be learnt from a vast amount of weakly related tasks.
After reading this article, you will understand:General Purpose Sentence Representation DesignArchitectureImplementationTake AwayGeneral Purpose Sentence Representation DesignGensSen is introduced in 2018 by Subramania et al.
They explore multi-task approach which is switching 5 NLP tasks to learn a generic sentence representation for downstream tasks during training time.
Selected tasks include:Neural Machine Translation (NMT): Translate English to France and GermanSkip-thought vectors: Predict previous sentence and next sentenceContituency Parsing: Identify sentence structureNatural Language Inference (NLI): Classifying entailmentAn approximate number of sentence pairs for each task.
(Subramania et al.
, 2018)In the below table, we noticed that “Our Model” (Subramania et al.
proposed approach) transfer performance is improved when adding more tasks.
Besides sentence representation, we also noticed that word embeddings result is better than others from the below tableEvaluation of word embeddings.
(Subramania et al.
, 2018)ArchitectureAll 5 tasks are sequence-to-sequence problem but solving different NLP problems.
The sequence-to-sequence architecture are one of encoder-decoder model while both input and output are sequential.
The encoder transfer sentence to a fixed length vectors while decoder try to produce a conditional probabilities via the chain rule.
Bidirectional Gated Recurrent Unit (GRU) are used in encoder while decoder use unidirectional GRU.
GRU help to solve diminish the effects of vanish gradient and have a better computational speed.
Since we need to learn a universal sentence representation, we have to share the encoder among tasks such that it can learn something from different tasks.
Encoded sentence representation will be send to different decoders (delegated decoder per task) to decode the vectors and classifying/ predicting the result.
After training the model, those decoders will be thrown away and we only need to the shared encoder in your specific NLP problem.
As authors propose to use different methods to train the encoder, we have to address when should task is switched?.Authors use a simple approach which is pick one of task to train after every parameter update sampled uniformly.
Take AwayMulti-task learning help to generalize word vectors such that it can transfer to other problem domain with better result easily.
About MeI am Data Scientist in Bay Area.
Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related.
You can reach me from Medium Blog, LinkedIn or Github.
, Adam T.
, Yoshua B.
, Christopher J P.
, Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning.
com/Maluuba/gensen.. More details