Original background image: http://clipart-library.
htmWant to do fancy Data Sciencing with Twitter data?I created an interactive Google Colab that walks you through an end-to-end analysis of what’s trending on a selected Topic.
Colab is an awesome (and very under-used) tool and its Form feature lets you create a great UX experience similar to an App while giving the option to dig deeper into the underlying code.
I use Seattle as my example in the notebook but it is setup to be used for any topic.
You can scrape twitter data, parse out tags (NER), do sentiment analysis, and visualize all that good stuff with plots and Word Clouds.
To jump straight into the Colab, go here: https://lnkd.
in/ghKzQV4OverviewThis is a quick and dirty way to get a sense of what’s trending on Twitter related to a particular Topic.
For my use case, I am focusing on the city of Seattle but you can easily apply this to any topic.
The code in the notebook does the following things:* Scrapes Tweets related to the Topic you are interested in.
* Extracts relevant Tags from the text (NER: Named Entity Recognition).
* Does Sentiment Analysis on those Tweets.
* Provides some visualizations in an interactive format to get a ‘pulse’ of what’s happening.
We use Tweepy to scrape Twitter data and Flair to do NER / Sentiment Analysis.
We use Seaborn for visualizations and all of this is possible because of the wonderful, free and fast (with GPU) Google Colab.
A bit about NERThis is the process of extracting labels form text.
So, take an example sentence: ‘George Washington went to Washington’.
NER will allow us to extract labels such as Person for ‘George Washington’ and Location for ‘Washington (state)’.
It is one of the most common and useful applications in NLP and, using it, we can extract labels from Tweets and do analysis on them.
A bit about Sentiment AnalysisMost commonly, this is the process of getting a sense of whether some text is Positive or Negative.
More generally, you can apply it to any label of your choosing (Spam/No Spam etc.
So, ‘I hated this movie’ would be classified as a negative statement but ‘I loved this movie’ would be classified as positive.
Again — it is a very useful application as it allows us to get a sense of people’s opinions about something (Twitter topics, Movie reviews etc).
To learn more about these applications, check out the Flair Github homepage and Tutorials: https://github.
com/zalandoresearch/flairUsageYou obviously need a Google account to use Colab.
You can edit the notebook by clicking the ‘Open in Playground’ option at the top.
I suggest saving it as your own copy in your Google Drive before proceeding at this point.
Open in Playground mode and then Save in your driveQuick note: A shortcut to running the cells is to press SHIFT+ENTER on your keyboard.
This comes in handy as you will be running through a lot of cells.
You should also go to Notebook Settings and make sure the GPU is selected as the Scraping/Analysis can get really slow with just a CPU.
GPU should be selectedYou will need Twitter API keys (and of course a Twitter account) to make this work.
You can get those by signing up here: https://developer.
com/en/appsOnce you have your API KEY and API SECRET KEY, you can enter those credentials into the Authentication cell and run the whole notebook.
Enter your Twitter API Keys hereGet That DataOnce authenticated, you can start the scraping!.Enter the search term you are interested in and the Tweets to pull.
We can scrape at most 45,000 tweets using the API every 15 minutes so the slider lets you select up to that limit.
You can select if you want to filter out retweets or not.
I chose to filter them out and focus on the original tweets.
Once, we have scraped the tweets, we load them into a pandas dataframe.
Here we do some slicing and dicing to get it ready for some visuals.
NER and Sentiment AnalysisWe extract the relevant tags using NER: Person, Organization, Location etc.
Since Flair is not equipped to handle hashtags, we create a custom tag for them.
Finally — we create a dataframe where we group all tags with their popularity, likes, replies, retweet metrics.
We also calculate the Average Polarity (sentiment score) for these tags.
Visualize!We create some basic plots for the most Popular, Liked, Replied, Retweeted tags.
We also cut this by Sentiment.
What’s hot in Seattle today?The cell is setup to let you filter by tag.
We can check the Filter_TAG box and then select the tag we want metrics for.
Then we simply re-run the cell to get the refreshed plots.
Word CloudI admit I spent more time than I should have setting the background masks but it looks cool!You can select the mask of your choice (Pick a Horse if you like horses or doing horse analysis for some reason…).
I selected Seattle and generated a Word Cloud with that mask (shown in the first image at the top).
Hopefully this gives you a good start with your Data project.
The Colab is setup to be run with minimal inputs but the code is right there if you want to dive deep and customize it according to your needs.
That is the beauty of notebooks.
You can start playing around with the notebook here: https://lnkd.