Contextual Embeddings for NLP Sequence Labeling

Contextual Embeddings for NLP Sequence LabelingContextual String Embeddings for Sequence LabelingEdward MaBlockedUnblockFollowFollowingFeb 2Text representation (aka text embeddings) is a breakthrough of solving NLP tasks.

At the beginning, single word vector represent a word even though carrying different meaning among context.

For example, “Washington” can be a location, name or state.

“University of Washington”Zalando released an amazing NLP library, flair, makes our life easier.

It already implement their contextual string embeddings algorithm and other classic and state-of-the-art text representation algorithms.

In this story, you will understand the architecture and design of contextual string embeddings for sequence labeling with some sample codes.

Architecture and DesignThe overall design is that passing a sentence to Character Language Model to retrieve Contextual Embeddings such that Sequence Labeling Model can classify the entityArchitecture and Design (Akbik et al.

, 2018)Contextual EmbeddingsDifferent from classical word embeddings, Akbik et al.

declares it as contextualized word embeddings.

In other word, the word embeddings capture word semantics in context such that it can represent differently under different even though it is same word.

You may reach out Contextualized Word Vectors (CoVe) and Embeddings from Language Models (ELMo) for more detail.

Albik et al.

named their embeddings as Contextual String Embeddings.

Character Language ModelUnlike other model, it is based on character level tokenization rather than word level tokenization.

In other word, it will convert sentence to a sequence of characters and go through language model to learn word representation.

Contextual Embeddings of “Washington” (Akbik et al.

, 2018)Taking “Washington” as an example, The bidirectional LSTM model allows “Washington” to retrieve information from previous word (i.


George) and next words (i.


was born) such that it can compute the vectors under a sentential context.

Vectors is concatenates by forward neural network and backward neutral network.

For forward neural network, hidden state after the last character in the word (i.


“n”) will be extracted.

Hidden state before the first character in the word (i.


“W”) will be extracted from backward neural network.

Stacking EmbeddingsSame as other studying, Akbik et al.

utilized stacking embeddings to achieve a better result.

Stacking embeddings means combine multiple embeddgins to represent a word.

For example, Akbik et al.

concatenate contextual embeddings and GloVe embeddings to represent a word for sequence tagging.

Sequence TaggingContextual Embeddings from Character Language Model and GloVe Embeddings are passed to Bidirectional LSTM-CRF architecture to solve the Named-Entity Recognition(NER) problem.

ExperimentExperiment Result between previous best result (Akbik et al.

, 2018)ImplementationNamed Entity Recognition (NER)Only need to execute the following command to load the pre-trained NER tagger.

from flair.

data import Sentencefrom flair.

models import SequenceTaggertagger = SequenceTagger.

load('ner')After that, you can simply pass the sentence to Sentence object then executing the prediction.

sample_texts = [ "I studied in University of Washington.

",]for text in sample_texts: print('-' * 50) print('Original Text') print(text) print('NER Result') sentence = Sentence(text) tagger.

predict(sentence) for entity in sentence.

get_spans('ner'): print(entity)And the result isOriginal Text:I studied in University of Washington.

NER Result:ORG-span [4,5,6]: "University of Washington.

"Sentiment ClassificationIt is as easier as NER.

from flair.

data import Sentencefrom flair.

models import TextClassifierclassifier = TextClassifier.

load('en-sentiment')Passing sentence to pre-trained classifiersample_texts = [ "Medium is a good platform for sharing idea",]for text in sample_texts: print('-' * 50) print('Original Text') print(text) print('Classification Result') sentence = Sentence(text) classifier.

predict(sentence) print(sentence.

labels)And result isOriginal Text:Medium is a good platform for sharing idea Classification Result: [POSITIVE (0.

7012046575546265)]Take AwayTo access all code, you can visit this CoLab Notebook.

Besides pretrained flair contextual embeddings, we can not only apply classic embeddings method such as GloVe, word2vec but also state-of-the-art embeddings method such as ELMo and BERT as well.

You can visit this guideline for reference.

We can also implement stacking embeddings very easy.

Just need few codes.

You can visit this guideline for reference.

If you want to train custom model on your data, this guideline will be useful for you.

ReferenceAkbik A.

, Blythe D.

Vollgraf R.


Contextual String Embeddings for Sequence Labeling.

flair in Pytorch.

. More details

Leave a Reply