These were some of the questions in my mind as I began to dig into Twitter data recently.
Let’s Use Twitter for Sentiment Analysis of EventsIf you prefer to listen to the audio version of this blog, I have also recorded a podcast episode for this blog post — where I go into more details of each of the step including caveats and things to avoid.
You can listen to it on Apple Podcasts, Spotify or Anchor.
fm, or on one of my favorite podcast apps: Overcast.
Audio version of this available in podcast episodeLet’s get right into the steps to use Twitter data for sentiment analysis of events:1.
Get Twitter API Credentials:First, visit this link and get access to a developer account.
Apply for access to Twitter APIOnce you register, you will have access to Consumer Token, Consumer Secret, Access Key as well as Access Secret.
You will need to mention the reason for applying for API access, and you can mention reasons such as “student learning project” or “learning to use Python for data science” as reasons.
Setup the API Credentials in Python:Save your credentials in a config file and run source .
/config to load the keys as environment variables.
This is so as to not expose your keys in a Python script.
Make sure to not commit this config file into Github.
We will tweepy library in Python to get access to Twitter API.
It is a nice wrapper over the raw Twitter API and provides a lot of heavy lifting for creating API URLs and http requests.
We just need to provide our keys from Step 1, and Tweepy takes care of talking with Twitter API — which is pretty cool.
Run pip install tweepy to get the tweepy package in your virtual environment.
(I’ve been using pyenv to manage different versions of Python, and have been very impressed.
You’ll also need pyenv-virtualenv package to manage virtual environments for you — but this is another blog in itself)In Python you can type:import osimport jsonimport tweepyfrom tweepy import Stream # Useful in Step 3from tweepy.
streaming import StreamListener # Useful in Step 3consumer_key = os.
getenv(“CONSUMER_KEY_TWITTER”)consumer_secret = os.
getenv(“CONSUMER_SECRET_TWITTER”)access_token = os.
getenv(“ACCESS_KEY_TWITTER”)access_token_secret = os.
getenv(“ACCESS_SECRET_TWITTER”)auth = tweepy.
set_access_token(access_token, access_token_secret)api = tweepy.
API(auth)This will setup your environment variables and also setup api object that can be used to access Twitter API data.
Getting Tweet Data via Streaming API:Having setup the credentials, now it’s time to get Tweet data via API.
I like using Streaming API to filter real time tweets on my topic of interest.
There is also Search API which allows you to search for historic data, but as you can see from this chart, it can be little restrictive for free access, with maximum access to data of last 7 days.
For the paid plan, based on what I saw online, the price can range anywhere from $149 to $2499/month (or even more) — I couldn’t find a page for exact pricing on Twitter website.
Types of Categories for Twitter Search APITo setup the Streaming API, you will need to define your own class method on_data that does something from the data object from Streaming API.
class listener(StreamListener): def on_data(self, data): data = json.
loads(data) # Filter out non-English Tweets if data.
get("lang") != "en": return True try: timestamp = data['timestamp_ms'] # Get longer 280 char tweets if possible if data.
get("extended_tweet"): tweet = data['extended_tweet']["full_text"] else: tweet = data["text"] url = "https://www.
com/i/web/status/" + data["id_str"] user = data["user"]["screen_name"] verified = data["user"]["verified"] write_to_csv([timestamp, tweet, user, verified, url]) except KeyError as e: print("Keyerror:", e) return True def on_error(self, status): print(status)I have not included write_to_csv function but it can be implemented using csv library, and some examples can be seen here.
You could also save the tweets into a SQLite database esp.
if there are several hundred thousand tweets.
SQLite will also allow you to have command line access to all the information via SQL commands.
For CSV you will have to load into a Pandas dataframe in a notebook.
It just depends on which workflow you prefer.
Typically you can just save into the SQLite database, and use read_sql command in pandas to make it into a dataframe object.
This allows me to access data both from command line as well as pandas.
Finally, run this function stream_and_write to start the Streaming API and call the listener we have written above.
The main thing is to call Stream API using extended mode as it will give you access to longer and potentially informative tweets.
def stream_and_write(table, track=None): try: twitterStream = Stream(auth, listener(), tweet_mode='extended') twitterStream.
filter(track=["AAPL", "AMZN", "UBER"]) except Exception as e: print("Error:", str(e)) time.
sleep(5)Another important thing to note is the number of items that you can track using Streaming API.
From my testing, I was not able to track more than 400 or so items in the track list.
Keep this in mind while building out your ideas.
Get Sentiment Information:Sentiment Analysis can be done either in the listener above or off-line once we have collected all the tweet data.
We can use out-of-the-box Sentiment processing libraries in Python.
From what I saw, I liked TextBlob and Vader Sentiment.
TextBlob provides a subjectivity score along with a polarity score.
Vader provides a pos, neu, neg and a compound score.
For single sentiment score between -1 and 1 for both libraries, use polarity from TextBlob and compound from Vader Sentiment.
As per the Github page of Vader Sentiment,VADER Sentiment Analysis.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
For TextBlob,from textblob import TextBlobts = TextBlob(tweet).
polarity) # Subjectivity, Sentiment ScoresFor Vader:from vaderSentiment.
vaderSentiment import SentimentIntensityAnalyzeranalyzer = SentimentIntensityAnalyzer()vs = analyzer.
polarity_scores(tweet)print(vs["compound"], vs["pos"], vs["neu"], vs["neg"])Saving the sentiment information along with tweets, allows you to build plots for sentiment score for different stocks or events over time.
Vader vs TextBlob will depend on your own project, I also tried ensemble of the two above libraries, but somehow liked the simplicity of just using one library for different tasks.
Ideally, you’d train your own sentiment analysis model with things that matter to you, but that’d need collecting your own training data and building and evaluating a machine learning model.
For being able to capture negative emotion in sentences like “I expected an amazing movie, but it turned out not to be”, we will need models that can work with sequence data and recognize that the not to be is negating the earlier amazing , this needs models built from Long Short Term Memory (LSTM) cells in neural networks.
Training and setting up a LSTM network for sentiment analysis could be another blog post of its own, leave comments below if you are interested to read about this.
Plot Sentiment Information:I plotted the sentiment when Manchester United lost to Barcelona 3–0 in their Champions League Quarter-Finals.
As you can see, the sentiment drops over time, and as the teams played on 16th Apr around the afternoon mark, the sentiment starts to drop.
Sentiment Dropping as Manchester United lose to BarcelonaWhile the sentiment score worked quite well for a sports event, what about stock market?.Below is the Qualcomm (QCOM) performance during the week when Apple dropped their lawsuits with Qualcomm (exact news of this came on April 16th).
We’d expect a significant increase in positive sentiment around this news, but it is very hard to conclusively see it below:QCOM stock performance during the week when their lawsuit with Apple was droppedThis makes the challenge of extracting alpha from sentiment analysis much more challenging.
I feel that it is a great tool for seeing sentiment around news events or sports activities but trying to co-relate sentiments with stock market performance is harder as it involves filtering out noisy tweets and also doing a lot of feature engineering work to get signal.
Set this up on AWS or Google Cloud Platform:You don’t want to be running the Streaming API on your computer, because when you shut it, the script will also stop.
You should run this on a AWS EC2 instance or on Google Cloud Platform server.
I am not going into the details on how to set that up, there are fantastic resources [AWS and GCP] for it.
Run the above script using tmux or screen and get access to topics of your interest on Twitter!It was great fun getting data from Twitter and building a sentiment analysis engine.
It is very satisfying to see downward or upward sentiment trends for events as you expect them to be.
The next step would be to plot the sentiment exactly as they unfold instead of saving the tweets first, this would need a library like dash.
Good luck exploring Twitter data — may you see the trends as you expect them to be!.Thanks for reading.