Fun with analyzing @BillGates tweets Twitter API’s-Step by Step analysis from Extraction, Data Visualization, and Sentiment AnalysisSenthil EBlockedUnblockFollowFollowingMay 10This is the 2nd post of the web scraping and API’s series.
The first post is here.
Please check it out.
In this post, we can see how to extract the twitter data using Twitter API’s and then do some basic visualization using word cloud, pie charts and then sentiment analysis using Textblob and Vader.
If you don’t have Jupyter then please go ahead and install anaconda.
Let's do the followingCreate a twitter account or use your existing Twitter account.
Request for twitter developer access keyExtract real-time tweets using Twitter Streaming APIExtract history tweets using Twitter Search/Rest API using TweepyLoad the data to a pandas data frameWordcloudScatter TextDo some statistical analysis and visualization.
Sentiment Analysis using Textblob.
Sentiment Analysis using VADERSentiment Analysis using Syuzhet(R)Before using the Twitter API we need a twitter account and then request for a developer key.
First, create a twitter account if you don’t have one.
The steps to get the necessary developer keys are belowWatch this youtube video for creating a twitter accountGoto to https://developer.
Log in using your twitter account.
If you don’t have one then create one.
After logged in then create an app.
Fill out all the required fields.
You will get the following credentialsAPI KeyAPI secret keyAccess tokenAccess token secretCheck out this youtube video on creating a twitterCheck out his documentation to get the Twitter Dev KeysTwitter provides 2 API’sStreaming APISearch API or Rest APIFrom the twitter developer website, the available python related wrappers are6000 tweets every second473,500 tweets per minute.
500 million tweets are sent every day200 billion tweets per year.
326 million people use Twitter every month2.
5 quintillion bytes of data generated every day.
50% of the earth’s population is on social media, with a total of 2,789 billion people.
By 2025, it’s estimated that 463 exabytes of data will be created each day globally — that’s the equivalent of 212,765,957 DVDs per day!Social media accounts for 33% of the total time spent online.
Streaming APIThe Twitter Search API and Twitter Streaming API work well for individuals that just want to access Twitter data for light analytics or statistical analysis.
Twitter’s Streaming API is a push of data as tweets happens in near real-time.
The major drawback of the Streaming API is that Twitter’s Streaming API provides only a sample of tweets that are occurring.
The actual percentage of total tweets users receive with Twitter’s Streaming API varies heavily based on the criteria users request and the current traffic.
Studies have estimated that using Twitter’s Streaming API users can expect to receive anywhere from 1% of the tweets to over 40% of tweets in near real-time.
(reference website)Before starting install all the required libraries — command prompt using pipPlease check out more on PIPpip install vaderSentimentpip install nltkpip install Textblobpip install numbypip install pandasAfter that import all the required librariesFirst, enter all your credentials.
Login into twitter account and go to the app and then fill all the above information.
Just do print and check your credentialsYou can filter byUser IDUser NameQuery by keywordsGeo LocationFilter by location and select the tweets I am filtering it by location = San FranciscoJust one tweet contains so much informationCheck out all the tweet objects availableThe JSON object looks likeYou can also save it to the CSV fileThe output of the 1 tweet with selected infoFor more info on the streaming APIThe output from the Twitter API is in JSON format.
So what is JSONPlease check this youtube video.
Please check the data camp article for more info on JSON file and pythonNow we can download the file in JSON formatUpload the JSON file to a list.
Check the keys and the tweet itselfThe output looks likeThe dictionary consists of a lot of information like created, user ID, retweeted, timestamp, etc.
Just read from the dictionary what information you need and then proceed with the text mining and visualization.
Search API or REST APITwitter’s Search API gives you access to a data set that already exists from tweets that have occurred.
For an individual user, the maximum number of tweets you can receive is the last 3,200 tweets, regardless of the query criteria.
With a specific keyword, you can typically only poll the last 5,000 tweets per keyword.
You are further limited by the number of requests you can make in a certain time period.
The Twitter request limits have changed over the years but are currently limited to 180 requests in a 15 minute period.
Let's use tweepy for search API.
pip install tweepy at the command promptSince I already installed it says that Requirement already satisfied.
Import the libraries needed.
Provide all the credentials you got before4.
Now we are going to extract the tweets The main function is below.
I am extracting the tweets of Bill Gates.
The maximum tweets allowed is 3200.
I can’t extract the whole tweet history of the user if they have tweeted more than 3200 tweets.
The downloaded file looks like below5.
Now upload the data into a panda data frame for data visualization and sentiment analysisThe data looks like below in the dataframeAbout 80% of the time is spent in preparing the data and only 20% in the analysis.
Do some basic cleaning of the tweetsSource: Forbes7.
Now we can do basic data analysis and visualizationLet's do a word cloud.
Check the documentation for more info.
The only required parameter is the text and all others are optional.
The output isYou can also do a maskThe original picture isand the wordcloudand the code isThere are 292731 words in the combination of all tweets.
Scatter TextA tool for finding distinguishing terms in small-to-medium-sized corpora, and presenting them in a sexy, interactive scatter plot with non-overlapping term labels.
Exploratory data analysis just got more fun.
Check out the GitHub linkThe visual is stunning.
I tried to just use the tweets to do a scatter text.
Please refer the GitHub for sample code and scenarios.
I think definitely my scatter text can be improved.
The code isI love the graph in the HTML display.
I have attached a screenshot.
Top 20 common words in the tweetsNow do some basic analysis.
Find the most liked tweets and most retweeted tweets.
The code isThe outputThe tweet with more likes is: Congratulations to the Indian government on the first 100 days of @AyushmanNHA.
It’s great to see how many people h… https://t.
co/mGXaz16H7WNumber of likes: 56774The tweet with more retweets is: RT @WarrenBuffett: Warren is in the house.
Number of retweets: 39904The most liked tweet isSource of the tweet — Obviously no iPhone and Android ????Twitter Web Client 1115Hootsuite 907Sprinklr 733Twitter Media Studio 100Twitter for Windows 89Twitter for Windows Phone 56Twitter for Websites 11Twitpic 10Twitter Ads 8Spredfast 6Twitter for Android 6Panoramic moTweets 5Mobile Web 3Seesmic Look 3Yfrog 1Vine for windows phone 1Facebook 1Twitter Web App 1Mobile Web (M2) 1Name: Source, dtype: int64Tweets by Year2013 is the year in which Gates tweeted 445 tweets and 409 in 2015.
The lowest is 2011 with 125 tweets.
2013 4452015 4092018 3702014 3572016 3382012 3292017 3192010 2302019 1352011 125Tweets by MonthApril is the month he tweeted the most with 329 and the lowest is November with 196Apr 329Mar 305May 292Feb 283Jan 282Jun 254Oct 236Sep 234Dec 227Aug 212Jul 207Nov 196Tweets by dayWednesday is the day in which he tweeted almost 609 tweets and next is Thursday with 586.
The weekends are the lowest ????Wednesday 609Thursday 586Tuesday 549Friday 521Monday 389Saturday 239Sunday 164Tweets by the hour:Most of the tweets are in the PM.
The maximum is around 5.
00 PM with 297.
Even he tweeted at midnight or 1.
00 AM ????.
AM tweets are very less.
17 29714 25021 23016 21920 21218 20215 19922 19313 18823 16319 1540 14712 1371 906 544 532 533 525 488 397 3011 279 1110 9Fav count & Retweets over the yearsFav and Retweet HistogramFav count — SeabornSentiment AnalysisImage Source: Medium ArticleWhat is the Sentiment Analysis?Sentiment Analysis also is known as Opinion Mining is a field within Natural Language Processing (NLP) that builds systems that try to identify and extract opinions within the text.
Usually, besides identifying the opinion, these systems extract attributes of the expression e.
:Polarity: if the speaker expresses a positive or negative opinion,Subject: the thing that is being talked about,Opinion holder: the person, or entity that expresses the opinion.
With the help of sentiment analysis systems, this unstructured information could be automatically transformed into structured data of public opinions about products, services, brands, politics, or any topic that people can express opinions about.
This data can be very useful for commercial applications like marketing analysis, public relations, product reviews, net promoter scoring, product feedback, and customer service.
Some of the Python Sentiment Analysis API’s & LibrariesScikit-learnNLTKSpaCyTensorFlowKerasPyTorchGensimPolglotTextBlobPatternVocabularyStanford CoreNLP PythonMontylinguaSentiment Analysis by TextBlob:TextBlob is a Python (2 and 3) library for processing textual data.
It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
The code isPercentage of positive tweets: 63.
23192672554792% Percentage of neutral tweets: 26.
66012430487406%Percentage of negative tweets: 10.
107948969578018%The output is aboveThe pie chart isSentiment Analysis by Vader:VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.
It is fully open-sourced under the [MIT License]For more information please check this GitHub linkThe code is similar to the above in calculating the sentiment score.
Only the library is different.
We are considering only the compound score.
Percentage of positive tweets: 66.
47039581288846%Percentage of neutral tweets: 20.
870134118416747%Percentage of negative tweets: 12.
659470068694798%SA — Sentiment Analysis Score from TextblobVSA — Sentiment Analysis Score from VaderSentiment Analysis by SyuzhetThis package comes with four sentiment dictionaries and provides a method for accessing the robust, but computationally expensive, sentiment extraction tool developed in the NLP group at Stanford.
For more information please check the link.
This package is available in R and the code is below.
I referred to the sample code and did it.
If you want more details then please check the link I provided and also check Rcran projects.
ConclusionHope you enjoyed reading this article.
Now you know how easy to extract data by using Twitter API’s -Streaming and Search.
With only a few lines of code, you can get the data from Twitter and use the pandas for all the analyses.
In the next post, I will write about scrapy and selenium.
I will also update my GitHub sometime soon and provide the link.
Meanwhile, if you want to provide feedback or suggestions or have any questions then please go ahead and send an email to epmanalytics100@gmail.
Thanks again for reading my post ????I found this chart very helpful.
(Source: FT)Jupyter Cheat Sheet (Source: Datacamp)Pandas Cheat Sheet 1(Source: Datacamp)Matplotlib Cheat Sheet(Source: Datacamp)Thanks again for reading my post????.. More details