HMTL – Multi-task Learning for solving NLP Tasks

While the different NLP tasks are often trained and evaluated separately, there exists a potential advantage in combining them into one model, i.e., learning one task might be helpful in learning another task and improve its results.Hierarchical Multi-Task Learning model (HMTL) provides an approach to learn different NLP tasks by training on the “simple” tasks first, and using the knowledge to train on more complicated tasks..The model presents state-of-the-art performance in several tasks and an in-depth analysis of the importance of each part of the model, from different aspects of the word embeddings to the order of the tasks.BackgroundSeveral papers from recent years showed that combining multiple NLP tasks can generate better and deeper representation of text..Each task is built from three components: Word embeddings, Encoder, and Task-specific layer.The base of the model is the word representation that embeds each word from the input sentence into a vector using three models:GloVe — Pre-trained word embeddings..This kind of representation is more sensitive to morphological features (prefix, suffix, etc) which are important in understanding relations between entities.In addition, each task is trained with a dedicated encoder — a multi-layer recurrent neural network that generates word embeddings tailored for the task..According to the paper, using the GM configuration in training improves the F1-score of the CR task by 6 points, while it improves the EMD and RE tasks by 1–2 points.The paper also claims to achieve state-of-the-art results in Named-Entity Recognition, although it seems that the recent BERT model reached slightly better results..It appears that the contribution of multi-task training is inconclusive and depends on the task:Different tasks achieved their best results with different task combinations, meaning there is no one dominant combination.In the low-level tasks, the benefit of the hierarchical model is small (less than 0.5 F-1 points).The biggest improvement was achieved in the RE task, with over 5 F-1 points..A possible explanation is that the EMD task is trained before the RE task and learns to identify almost the same entities as the RE task.Task combinations comparisonWord representationAs mentioned previously, the base of the model is the word representation, which consists of three models — GloVe, ELMo and Character-level word embeddings.. More details

Leave a Reply