Every so often, we see stories in the news about facial recognition technologies failing on minority populations, or Twitter bots spouting racist remarks.
But the truth of the matter is that we keep on hearing about bias and AI without extensively learning about how exactly these biases get encoded in the technologies we use.
As a result, I’ll be explaining some of the shortcomings of a tool known as word embeddings, because they’re used for a wide variety of tasks involving computers and human language, or natural language processing (NLP), and because it’s relatively easy to explore and explain how these tools can be problematic without using lots of complicated technical jargon.
First, let’s learn more about NLP and how word embeddings fit in, and then we’ll learn about how embeddings themselves contribute to the creation of biased results.
The field of NLP relies on one key paradigm: treating text as data.
This text can come from any source– movie reviews, ancient poems, even spoken words — and can be used for any task, whether it’s detecting if an essay has a positive or negative tone, translating a phrase to another language, or even conducting a search online.
However, all tasks an NLP system is used for involve creating mathematical models of the text.
In order to do this, it’s critical to have a numerical representation of each word in the input text so that a model can generate an output, such as a list of relevant websites or an accurate translation, based on the text provided.
Word embeddings essentially increase the information about each word that gets captured in these numerical representations.
The name of one of the most famous algorithms, “Word2Vec” presents this idea quite well, in which a word gets represented as a vector, or collection of numbers, itself generated by machine learning tools.
Although NLP methods have been around for years before the introduction of word embeddings in the early 2010s, these techniques truly revolutionized the field, allowing for some critical discoveries later in this decade.
The reason why word embeddings are so effective is that they are able to encode the relationship between each word with every other word in the text; this was unavailable in previous representations of words.
Specifically, this is done through the idea that a word is defined by the words around it.
If two words get mentioned in a similar context (“good” and “great” for instance) in the training corpus (the body of text from which the embeddings get “learned”), then their corresponding vectors will also be similar.
In order to find out why word embeddings can become problematic, we need to look at the way the models they’re based on are evaluated.
Having a robust way to assess a machine learning model is as critical as the model itself.
The most common way to see if word embeddings are accurate is by using them to evaluate analogies.
This is because the task is quite simple; mathematically, it’s just adding and subtracting the vectors.
Let’s take the example man is to woman as king is to ______.
Given a set of inputs like this, a quick transformation of the data would lead to the embedding for queen.
I plotted the vectors for the four words below.
The vectors for “king” and “queen” have a similar relationship to the vectors for “man” and “woman.
”In addition to these analogies, the simple steps of adding and subtracting vectors can also capture grammatical relationships like if words are singular or plural, and even facts about the world like countries and capitals.
However, if we give our program the query “Man is to woman as doctor is to ______?” it ends up outputting “nurse.
” The bias isn’t just limited to gender, as the system recognizes “Police is to white as criminal is to Black,” and “Lawful is to Christianity as terrorist is to Islamic.
”Because word embeddings get fed into other algorithms, their inherent bias could result in particularly problematic situations– an HR professional searches for “engineers,” on a site like LinkedIn, and sees that male engineers might be ranked higher than their equally talented female peers, or more dangerously, if a police department is tasked to heavily patrol a primarily Black neighborhood based on written crime reports.
We could expect the text the embeddings were “learned” from, Wikipedia and news articles, to be relatively unbiased, but the words themselves might be mentioned in similar contexts.
For instance, female gender pronouns like “her” might be used more frequently around the word “nurse,” just because our texts might talk about female nurses more than male nurses.
However, allowing these associations to govern large-scale software systems is quite risky.
At this point, we’ve come to a grave conclusion.
If we cannot effectively evaluate the building blocks of modern NLP, how can we trust the algorithms that use them?Let’s now look at the dataset in more detail.
I’ve done these analyses in a couple Python notebooks written as a companion to this post, so feel free to follow along there if you want!.I’ve written a more technical treatment of the previous discussion as well.
If you’re familiar with the concept already, check out the reading list in the Github repo.
First, we compare the vectors for one word with a pair of words.
When we compare “engineer” to both “man” and “woman,” we find that the vector for “engineer” is more similar to the vector for “man” than the vector for “woman.
” The difference isn’t too big, but quite noticeable.
However, when we compare “engineer” to the pair “Asian” and “African-American,” we see that “Asian” is much more similar to “engineer” than “African-American.
”Next, we take a pair of vectors, such as “man” and “woman,” and look at the vectors that represent people that are closest to them.
The vector that was most similar to “woman” that also represented a person corresponded to the word “victim,” and other vectors that were close to “woman” represented occupations like “teacher” and “prostitute,” while the most similar vectors to “man” referred to words like “soldier” and “hero.
” When we look at the vectors for “citizen” and “immigrant,” the most similar vectors to “citizen” are usually professional occupations like “lawyer” and “businessman,” but “peasant” and “laborer” are closer to “immigrant.
” Lastly, when comparing the vectors for “Christianity” and “Islam,” we find that both vectors are close to many religious terms, but the vector for “Islam” is far closer to the vectors for “radical,” “fundamentalist,” and “extremist.
”The final piece of analysis we perform is probably the most interesting.
Vectors aren’t just a collection of numbers; they’re also a way to represent these numbers in a space.
Thus, this relies on human biases transforming the vector space that our dataset lives in.
We look at the vector that quantifies a particular type of bias.
For instance, the vector corresponding to the difference between “he” and “she” can represent a “gendered component.
” This is because the numerical value of the vector captures how “male” and “female” vectors differ numerically.
If we transform the dataset such that we assign a score to each vector based on the difference between “he” and “she,” we find that words that have to do with sports and the military have higher scores, and words that describe the performing arts as well as female family members have lower scores.
I plotted all the vectors after doing the transformation to get the result below.
Points to the right represent more “male” vectors and points to the left represent more “female” vectors.
Let’s look at a couple of the points on this graph.
“engineer,” “quarterback,” and “drummer” are in the “male” quadrant, while the “cook”, “violist,” and ”housewife”are in the “female” quadrant.
In fact, trying to “de-bias” word embeddings so they don’t contain problematic relationships is a major area of research in NLP, and translating bias to operations on a vector space is a key paradigm in work like this.
To recap, we’ve gone over word embeddings and how they’ve enabled many discoveries in the field of natural language processing because they encode important information about a word in a numerical form.
We’ve also talked about how the way they’re evaluated is fundamentally flawed because they infer relationships between words that perpetuate the biases in the language we use.
Finally, we’ve analyzed the dataset and have a basic understanding of what causes these inferences.
When we look at systems that use NLP technologies, from Web searching to virtual assistants, it’s crucial for us to understand that these systems have been trained on data generated by humans.
This is because technology is often painted as an objective ideal that addresses the world’s problems by writing calculations on a blank slate.
However, as soon as our technical systems use data that’s generated by humans, in products that should be used by humans, we need to be aware of the biases that get reinforced by these systems and also how they can be fixed.