Game of Thrones Twitter Sentiment with Google Cloud Platform and KerasAn end-to-end pipeline with AI Platform, Apache Beam / DataFlow, BigQuery and Pub/SubThomas DehaeneBlockedUnblockFollowFollowingMay 22The final season of Game of Thrones apparently raised a lot of eyebrows, so I wanted to dig deeper on how people felt before, during and after the final episode of Game of Thrones by turning towards the ever non-soft-spoken Twitter community.
In this blogpost, we’ll look at how an end-to-end solution can be built to tackle this problem, using the technology stack available on Google Cloud Platform.
Let’s go!The focus is more on realising a fully working solution, rather than perfecting a single component in the entire pipeline.
So any of the individual blocks can certainly be perfected!To keep it readable, I haven’t included all of the code, but everything can be found on this Github repo, fully commented.
The basic ideaThe rough outline for the entire pipeline looks something like this:Basically, want can be done is:Have a script running on a VM, scraping tweets on Game of ThronesHave a PubSub topic to publish messages toHave a served ML model to classify tweet sentimentHave an Apache Beam streaming pipeline pick up the tweets and classify themOutput the classified tweets to BigQuery, to do analyses onIn the rest of the post, we’ll glance over all of the various components separately, to finalize with a big orchestra of harmonious pipelining bonanza!We will be relying heavily on Google Cloud Platform, with the following components:Compute Engine: to run the tweepy script onCloud PubSub: to buffer the tweetsCloud Dataflow: managed Apache Beam runnerAI Platform: to serve our ML model via an APIBigQuery: to store our tweets in1.
Script on GCE to capture tweetsCapturing tweets related to several searchterms can easily be done using the tweepy API, like so:To send it to Google Cloud PubSub, we can just use the client library:So with this done, it’s just a simple as:Setting up a VM on Google Compute Engine (I’ve used a simple n1-standard-1)Copying the script to a bucket on Google Cloud StorageSSH into the VMCopy the script from the bucket to the environmentInstall python3 on the VMRun the python script2.
Cloud PubSub topic as message brokerPub/Sub is a great piece of messaging middleware, which serves as the event ingestion and delivery system in your entire pipeline.
Especially in this case, where the tweets will potentially flow in much faster than the streaming pipeline can pick them up, it’s a great tool, given that ingestion and delivery are decoupled asynchronously.
Pub/Sub can also store the received messages for a number of days, so no worries if your downstream tasks struggle to keep up.
Creating a topic is extremely easy: just navigate to your GCP Console and go to the Pub/Sub menu:From here on, just click the CREATE TOPIC button and fill in a name for your topic.
For future reference, I’ve named mine ‘got_tweets’.
Served ML model on AI PlatformFor each tweet coming in, we want to determine if the sentiment expressed (presumably towards the episode) is positive or negative.
This means we will have to:look for a suitable datasettrain a machine learning modelserve this machine learning modelDatasetWhen thinking about sentiment analysis, we quickly think of the ‘IMDB Movie Review’ dataset.
For this specific purpose though, this classic seemed less suited, since we are dealing with tweets here.
Luckily, the Sentiment140 dataset, which contains 1.
6 million labeled (positive and negative) tweets, seems to be perfectly suited for this case.
More info, and the dataset, on this Kaggle page.
Some examples:sample from the Sentiment140 datasetPreprocessing the text is done in a separate class, so that it can later be reused when calling the model:ModelFor the classification model itself, I based myself upon the famous 2014 Yoon Kim paper on Multichannel CNN’s for Text Classification (source).
For ease of development (and later deployment), I used Keras as the high-level API.
A CNN-based model provides the additional benefit that training the model was still feasible on my little local workstation (NVidia GTX 1050Ti with 4GB memory) in a decent time.
Whereas an RNN-based model (often used for sentiment classification) would have a much longer training time.
We can try to give the model some extra zing by loading some pretrained Word Embeddings.
In this case: the Glove 2.
7B Twitter embeddings seemed like a good option!The full code can be found in this notebook.
We trained the model for 25 epochs, with two Keras Callback mechanisms in place:a callback to reduce the LR when the validation loss plateausa callback to stop early when the validation loss hasn’t improved in a while, which caused it to stop training after 10 epochsThe training and testing curve can be seen here:So we obtain an accuracy of about 82.
Serving the modelAI Platform provides a managed, scalable, serving platform for Machine Learning models, with some nice benefits like versioning built into it.
Now for hosting, there’s one special aspect of our model which makes it a bit less trivial to serve it in AI Platform: the fact that we need to normalize, tokenize and index our text in the same way we did while training.
Still though, there are some options to choose from:Wrap the tf.
keras model in a TF model, and add a Hashtable layer to keep the state of the tokenization dict.
More info here.
Go full-blown and implement a tf.
transform preprocessing pipeline for your data.
Great blog post about this here.
Implement the preprocessing later on, in the streaming pipeline itself.
Use the AI Platform Beta functionality of having a custom ModelPrediction class.
Given that there wasn’t time nor resources to go full-blown tf.
transform, and that potentially overloading the streaming pipeline with additional preprocessing seemed like a bad choice, the last one looked like the way to go.
The outline looks like this:Custom ModelPrediction classes are easy enough, there’s a great blogpost by the peeps from Google on it here.
Mine looks like this:To create a served AI platform model from this, we just need to:package up the custom prediction and the preprocessing .
py fileupload this package, with a persisted model and preprocessing class instance to a bucketfrom there on, create a model named whatever you wantin this model, create a new version, based on the uploaded items with some beta magic:4.
An Apache Beam streaming pipelineTweets come in in a streaming fashion, it is literally an unbounded dataset.
A streaming pipeline therefore seems like the perfect tool to capture tweets from a Pub/Sub topic and process them.
We will use Apache Beam as the programming model, and run the pipeline on a Dataflow runner (managed environment on Google Cloud for running Beam pipelines).
For those of you who want to read more on Apache Beam and its paradigm can read more on the website.
Firstly, when streaming, we have to consider a Windowing strategy.
Here, we just use a fixed window of 10 seconds.
Fixed windowing strategy (source)Other strategies can be done as well, such as a moving window strategy.
This would probably infer extra calls to the hosted ML model.
So the fixed windowing seemed the easiest to get started with.
The main steps in our pipeline are:Pull in Pub/Sub messages in 10-second intervalsBatch them up in batches of 50 messages (not too big, or the body of the request will be too large)Classify them by making calls to the hosted ML modelWrite them to a BigQuery collectionIn parallel, group the mean sentiment on this 10-second, and write this to a second BigQuery collectionWhen running on Cloud Dataflow, it looks as follows:The full code is a little long to paste here, but it can be found in full on my Github repo.
Have a BigQuery collection to stream results intoAs stated before, we have two BigQuery tables to stream the results into:One for the individual posts, with the sentiment label, to perhaps relabel them in the future and finetune our classifierOne for the mean predicted sentiment per 10-second windowYou can just create these from the UI, and specify a schema (which of course has to map to the specified schema in your Beam pipeline job).
The runI ran the entire pipeline for a few hours, to capture both the sentiment leading up to, during and after the episode.
Given that the amount of tweets could quickly become fairly large, it was also good to observe the scaling capabilities of all of the components:AI Platform: a real MVP in this story, scales really well in the backend when the load increases, to try and keep response times stable:Cloud Dataflow: in hindsight, Java streaming feels a bit more solid than Python streaming.
Autoscaling does not currently work when streaming Python pipelines; this caused the system delay time to grow throughout the run:System delay (in second, right hand side axis)BigQuery: not a problem at all.
BQ operates with a streaming buffer, and offloads data periodically to the table itself.
Also for post-analysis, BQ is never an issue.
In total, about 500.
000 tweets were collected in a 7-hour period.
Here are some examples, with their predicted sentiment (warning: spoiler alert!)The resultsNow as for the main question, we could try to frame it as:What is the average sentiment expressed in the tweets, per minute.
In one hour before, during and one hour after the episode.
Simple enough with some SQL query magic (see the notebook in the repo), with some notes:The scores were standardized to mean 0 and stddev 1Both the moving average and raw mean sentiment are shownSome key scenes from the show are mentioned????.So apparently, the community was very hostile towards GoT before the show, gradually putting down their pitchforks and torches towards the beginning of the episode.
????.It could be stated that Bran being named king was well received, I too thought this was a very nice plot twist ????!????.Another positive scene was when Brienne of Tarth was writing about Jaime in the book of knights.
????.After the episode, the community seemed to be rather negative towards the final episode, changing their mind a little after about 45 minutes, before becoming negative once again…They ended up being rather negative of the episode, which seems to be reflected in the IMDB score of only 4.
One could argue that the episode didn’t stand a chance, as the community was already rather negative before the episode, so that the sentiment somewhat started with a disadvantage bias.
Is this the ground truth though?.Nobody knows for sure, but I’m quite happy with the results ????.
So there we have it!.An answer to our question, using the toolbox Google Cloud provides us.
FYI: the total cost of the operation ended up being around $5, which I would say is fairly reasonable!.