10 Exciting Ideas of 2018 in NLP

By Sebastian Ruder, Aylien.

This post gathers 10 ideas that I found exciting and impactful this year—and that well likely see more of in the future.

For each idea, I will highlight 1-2 papers that execute them well.

I tried to keep the list succinct, so apologies if I did not cover all relevant work.

The list is necessarily subjective and covers ideas mainly related to transfer learning and generalization.

Most of these (with some exceptions) are not trends (but I suspect that some might become more trendy in 2019).

Finally, I would love to read about your highlights in the comments or see highlights posts about other areas.

There were two unsupervised MT papers at ICLR 2018.

They were surprising in that they worked at all, but results were still low compared to supervised systems.

At EMNLP 2018, unsupervised MT hit its stride with two papers from the same two groups that significantly improve upon their previous methods.

My highlight:Toy illustration of the three principles of unsupervised MT.

A) Two monolingual datasets.

B) Initialization.

C) Language modelling.

D) Back-translation (Lample et al.

, 2018).

Using pretrained language models is probably the most significant NLP trendthis year, so I wont spend much time on it here.

There have been a slew of memorable approaches: ELMo, ULMFiT, OpenAI Transformer, and BERT.

My highlight:Word sense disambiguation (left) and POS tagging (right) results of first and second layer bidirectional language model compared to baselines (Peters et al.

, 2018).

Incorporating common sense into our models is one of the most important directions moving forward.

However, creating good datasets is not easy and even popular ones show large biases.

This year, there have been some well-executed datasets that seek to teach models some common sense such as Event2Mind and SWAG, both from the University of Washington.

SWAG was solved unexpectedly quickly.

My highlight:VCR: Given an image, a list of regions, and a question, a model must answer the question and provide a rationale explaining why its answer is right (Zellers et al.

, 2018).

Meta-learning has seen much use in few-shot learning, reinforcement learning, and robotics—the most prominent example: model-agnostic meta-learning (MAML)—but successful applications in NLP have been rare.

Meta-learning is most useful for problems with a limited number of training examples.

My highlight:The difference between transfer learning multilingual transfer learning, and meta-learning.

Solid lines: learning of the initialization.

Dashed lines: Path of fine-tuning (Gu et al.

, 2018).

This year, we and others have observed that unsupervised cross-lingual word embedding methods break down when languages are dissimilar.

This is a common phenomenon in transfer learning where a discrepancy between source and target settings (e.

g.

domains in domain adaptation, tasks in continual learning and multi-task learning) leads to deterioration or failure of the model.

Making models more robust to such changes is thus important.

My highlight:The similarity distributions of three words.

Equivalent translations (two and due) have more similar distributions than non-related words (two and cane—meaning dog; Artexte et al.

, 2018).

There have been a lot of efforts in better understanding representations.

In particular, diagnostic classifiers (tasks that aim to measure if learned representations can predict certain attributes) have become quite common.

My highlight:Per-layer performance of BiLSTM and Transformer pretrained representations on (from left to right) POS tagging, constituency parsing, and unsupervised coreference resolution (Peters et al.

, 2018).

In many settings, we have seen an increasing usage of multi-task learning with carefully chosen auxiliary tasks.

For a good auxiliary task, data must be easily accessible.

One of the most prominent examples is BERT, which uses next-sentence prediction (that has been used in Skip-thoughts and more recently in Quick-thoughts) to great effect.

My highlights:Syntactic, PropBank and coreference annotations from OntoNotes.

PropBank SRL arguments and coreference mentions are annotated on top of syntactic constituents.

Almost every argument is related to a syntactic constituent (Swayamdipta et al.

, 2018).

With the recent advances in transfer learning, we should not forget more explicit ways of using target task-specific data.

In fact, pretrained representations are complementary with many forms of semi-supervised learning.

We have explored self-labelling approaches, a particular category of semi-supervised learning.

My highlight:Inputs seen by auxiliary prediction modules: Auxiliary 1: They traveled to __________________.

Auxiliary 2: They traveled to Washington _______.

Auxiliary 3: _____________ Washington by plane.

Auxiliary 4: ________________________ by plane (Clark et al.

, 2018).

There have been a lot of developments in question answering (QA), with an arrayof new QA datasets.

Besides conversational QA and performing multi-step reasoning, the most challenging aspect of QA is to synthesize narratives and large bodies of information.

My highlight:Comparison of QA datasets (Kočiský et al.

, 2018).

 Inductive biases such as convolutions in a CNN, regularization, dropout, and other mechanisms are core parts of neural network models that act as a regularizer and make models more sample-efficient.

However, coming up with a broadly useful inductive bias and incorporating it into a model is challenging.

My highlights:10 years of PropBank semantic role labeling.

Comparison of Linguistically-Informed Self-Attention (LISA) with other methods on out-of-domain data (Strubell et al.

, 2018).

Original.

Reposted with permission.

Bio: Sebastian Ruder is a NLP, Deep Learning PhD student @insight_centre and a research scientist @_aylien.

Resources:Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.. More details

Leave a Reply