Courtesy of Caspar Camille Rubin on UnsplashQuantifying Chatroom ToxicityUsing Machine Learning to Identify Hate in Online ChatroomsJeremy ChowBlockedUnblockFollowFollowingJun 25Note: Vulgar language examples belowCode used in this project can be found here.
The other night I was at an SF data science meetup hosted by Twitch (beautiful office and great food!) and I struck up a conversation with some software engineers there.
Turns out, they were on a ‘Safety’ team entirely dedicated to keeping chats clean and protecting streamers and viewers in the chatroom, which makes sense because the chat interactivity is arguably the core driving experience of Twitch.
With this in mind, I set out to see if I could build my own version of a chat toxicity classifier (with the help of my good friend Randy Macaraeg).
Today I’d like to walk you through that process!Twitch chat is a core driver of its user experienceThe first step is to find a dataset.
Luckily, our friends at Kaggle hosted a competition for identifying toxic comments on Wikipedia, which offers a dataset of over 300,000 comments manually tagged with binary labels of six categories including toxic, severely toxic, obscene, threat, insult, and identity hate.
Natural Language Preprocessing in a NutshellOnce we have our data, we need to go through the cleaning process.
For text data sets in general, this often includes:Removal of punctuationRemoval of stop words (things like “the”, “this”, “what”)Stemming/lemmatization (reducing words down to their base form by removing suffixes like “-ed”, “-ing”).
In addition, dealing with online text and comments requires the further removal of items like hyperlinks, usernames, and automated messages (the most common comment in this dataset was a Wikipedia welcome message to new users).
What preprocessing you do varies from project to project, and for this we had the best accuracy metrics doing all of the above except stemming and lemmatization, which dropped ROC AUC metric by 10%.
We also considered n-grams, where you look at pairs or trios (or more) of words in addition to single words, but that also dropped our ROC AUC metric by 5%, so we ultimately moved forward without it.
VectorizationTo turn words into something a machine learning algorithm can understand and process, we need to do something called vectorization.
Put simply, this is the process of turning words into multi-dimensional vectors in such a way that the meaning or context of the word is correlated with where that vector points.
In a sense, vectorization allows computers to quantify the meaning of words by mapping similar word meanings to similar vector spaces.
Term Frequency — Inverse Document Frequency VectorizationFor this application, we want rapid, context-based vectorization.
Some methods such as Word2Vec and spaCy involve pre-trained, multi-gigabyte models that are trained on hundreds of thousands, if not millions of documents and then reduce the meanings of words to a set of a couple hundred numbers.
These models are excellent at preserving context and meaning of words in general, but they are slow and large.
This is where Term frequency, inverse document frequency vectorization comes in (TF-IDF).
This vectorization method looks at the number of times a word appears in a comment relative to the number of times it appears in other comments.
Two things result in a higher TF-IDF score: higher frequency within the specific comment being scored, and lower frequency across all other comments.
A glance at the highest TF-IDF scores in comments with the toxic label.
We can see cursing is highly correlated with the toxicity labels!Modeling!Once we convert all our comments into sets of TF-IDF word vectors, the next step is modeling.
The end goal is to implement a kind of auto moderation algorithm to a chatroom, requiring a fast algorithm capable of handling hundreds of thousands of concurrent viewers chatting with each other.
To do this, we use basic logistic regression for classification.
In essence, logistic regression uses your middle school slope formula:where y is the odds that something will occur (squashed between 0 and 1 using a sigmoid function), m is the unit change in y due to a change in the independent variable x and b is the bias, or the y-intercept.
This article is great for a more in depth explanation of logistic regression.
So now we have all the pieces to build our models.
At this point, we train our models by feeding each comment into our TF-IDF vectorizer, which turns that comment into a vector of 20,000 features (the maximum number of words we want to track as our vocabulary).
Then we pass those 20,000 TF-IDF score features into our model.
For simplicity, we train six separate models independently, one for each label.
This gives us the following ROC scores!Receiver Operating Characteristic (ROC) Area Under the CurveReceiver Operating Characteristic (ROC) curves for logistic regression models.
The area under the ROC curve tells us how many more false positives we get if we change our probability threshold to get more true positives.
In the context of this project, a false positive is saying a comment is toxic when it really is not, and a true positive is correctly identifying a toxic comment.
Basically, more area under the ROC curve means we get fewer additional false positives for additional true positives, which is what we want to maximize.
As you can see, we get AUC scores of around 0.
97 to 0.
98, which is very close to the maximum of 1.
This means we can celebrate, because our models are great at identifying true positives!Conclusions and ImplementationSo we’ve created a model that takes in a internet comment, then spits out six probabilities for whether the comment falls into each of the six toxic categories.
On a 2019 Macbook Pro, the classification speed is about 1150 messages per second!.Not bad.
While there are some improvements to be made, this makes a great moderation tool that can be used to alert a moderator if a hostile individual is posting in their channel.
Keep an eye out for my next post to learn how we deployed this model online and in a twitch chatbot!.