How Did Twitter React To Gillette’s ‘Toxic Masculinity’ Ad: A Sentiment Analysis using R and Twitter’s APINathan RodriguesBlockedUnblockFollowFollowingJan 20Picture taken from BloombergGillette’s new ad campaign invoking the #MeToo movement is the latest test of how big consumer brands can navigate social movements to appeal to millennials.
The ad has sparked a debate online and the response to the viral video has been split.
To learn more about the ad, visit here.
In this post, I will guide you through my data analytical process for Opinion Mining and Sentiment Analysis using Gillette’s Twitter data.
Before diving in, I’d like to outline my process:· Define the Problem· Gather the Data· Clean the Data· Sentiment Analysis I: Using positive/negative sentiment scores· Sentiment Analysis II: Using word clouds and emotion scoresDefine the ProblemRecently, I’ve been interested in text mining using social data and when I stumbled upon Gillette’s ad controversy I thought it would be an opportunity to analyze users’ sentiments.
Twitter seemed to be the best option as it is geared towards sharing ideas and engaging in news topics.
Gather the DataMy data set consists of 10,000 tweets taken directly from Twitter’s API from January 13th, 2019 onward (the day the ad was launched) and imported into R.
Other files include a lexicon of positive and negative opinion words in .
These lexicon files were taken from Minqing Hu and Bing Liu’s paper, Mining and Summarizing Customer Reviews, presented at the Proceedings of the ACM SIGKDD International Conference on Knowledge.
Clean and Explore the DataThe tweets were unstructured and efforts were made to clean it.
Using Edureka’s video and Bharatendra Rai’s video as a guide, I cleaned the tweets by: removing punctuation, control words, digits, URLs, twitter usernames, alphanumeric characters, stop words, white-spaces, and certain anomaly words.
In the barplot below, we can see the words that are most frequently used in all 10,000 tweets:Barplot for word frequency among 10,000 tweetsSentiment Analysis I: Using positive/negative sentiment scoresThe goal here is to match each tweet to a sentiment.
In this case, there are three sentiments: positive, negative, and neutral.
We start this by scanning each word in a tweet.
Let’s look at a few examples to understand how sentiment scores are determined.
In the tweet, “Very well said”, we have one positive word, “well”, that is found in the positive lexicon and no negative words.
Thus, the total score for this tweet is 1.
In another tweet, “The best a man can get or give… Hypocrites”, “best” is a positive word and “hypocrites” is a negative word.
Thus, the total score for this tweet is 1–1 = 0.
Consequently, we do this for all 10,000 tweets and end up with scores ranging from -5 to +6.
The barplot below shows the distribution of all 10,000 tweets.
We observe that a majority of tweets range between -1 to 1.
Specifically, the number of tweets that have scores of -1, 0, and 1 equals 2476, 4622, and 1609, respectively.
Neutral tweets have the highest frequency in this scenario.
As an exploratory exercise, this is useful in observing a general view of the data.
However, it doesn’t reveal much at the same time.
Hence, we can dive deeper by matching tweets with a range of emotions, instead of only relying on positive, negative and neutral sentiment scores.
Sentiment Analysis II: Using word clouds and emotion scoresInstead of using a lexicon of positive and negative words, we can use a lexicon of emotions.
With the help of the Syuzhet package in R, tweets are scanned to match emotions in the lexicon.
The package manual states, “The NRC emotion lexicon is a list of words and their associations with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive)”.
The word cloud below highlights the most frequently used keywords in the sample of tweets:Finally, the percentage of each emotion in the text can be plotted as a bar graph:If anything, this barplot is an indication of how split the response to the viral video has been.
RecommendationA few recommendations I would make include:· Improving the methodology for calculating sentiment scores.
For example, the current methodology cannot detect sarcastic comments and will assign them positive values instead of negative values.
· Adding a time component to determine whether the controversy is fading away or sticking aroundCheck out my code over here.
Please leave any comments for improving my analysis.
Thanks for reading.