This is a potential future direction of research, as I will be able to filter tweets containing the word Mexico and the words and phrases that are most associated with those tweets.
The code for this section can be found in this Github Repo!Global Trends MapWhile working with the large amounts of tweets regarding Trump, we became curious about what other topics may be trending around the world.
We started by attempting to use the Twitter API to retrieve trending topics from every country in the world.
However, we quickly realized that Twitter simply does not record trends for every country (i.
Twitter is blocked in Iran, so there are no trends there!).
Luckily, the Twitter API can return the names of each country for which it does have trends, so after pulling those we used them to retrieve trends for every country that Twitter records.
After keying into the data, we stored the top three trends for each possible country in a dictionary, with the country name as the key and the trends as the value.
In order to visualize this data, we decided to create an interactive map, where, upon hovering over a country, a small box appears with the country name and the current top three trends in that country.
We used Plotly to create the map and discovered that we would need to move all our data into a CSV file in order to make it easily readable for Plotly.
To do so, we simply used csv.
writer to create a new row for each country in the dictionary and its trends.
We decided to use a chloropleth as the foundation of our map model, which had advantages and disadvantages.
Chloropleths fill in geographical areas based on a quantitative value, which we initially planned to be the number of Twitter users.
However, this data is not accessible through Tweepy and is not reliable elsewhere on the Internet.
We decided to continue with the Chloropleth model anyway, which would allow for that feature to be implemented should the data become available.
In the meantime, we filled in every country where the trending tweets are available in just a solid color.
That way, users can easily tell which countries have data, and which do not.
Plotly also used different country codes than Twitter did, so we needed to import the correct codes and map them to their given country.
Another issue we encountered when creating this map was that some trends in non-English speaking countries used a different alphabet, and when we converted our data to a CSV file, these characters were changed to random ASCII symbols.
We attempted to fix the problem by using ‘utf-8’ encoding, but when that didn’t work we decided to filter the top trends to only those using English characters.
If we had more time to work on the project, we would focus on getting the actual symbols from each country into the top trends, so that the trends of the countries are more accurately depicted.
We think this represents as a whole one of the major issues of processing loads of data: minority groups can often be overlooked and underrepresented.
After developing the map, it became much easier to spot any trends or interesting differences between countries.
For example, after Messi scored a crowd-inspiring, 27 meter free kick against Liverpool, he was trending in countries around the world.
The map makes it clear which events reach a global scale, and which are important within just a country or region.
See the interactive map below, and check out the code at this GitHubRepo.
Bonus!Now that we have learned a lot about Trump and global trends, we thought “why not apply what we have found to an equally polarizing political figure in Denmark?”Ulf Aslak.
Although there is no www.
com, we were lucky that Ulf only has 950 tweets, so we were able to use the Twitter API and Tweepy to scrape his entire timeline.
Last semester, several students in Big Data did a project on if Trump’s tweets influence the stock market, and part of their analysis was relevant to our bonus endeavor — here are links to their Blog Post and their GitHub Repo.
We modified some of their code and created comparable graphics for Ulf so we can compare the two in terms of word clouds, sentiment scores, tweets by day, and tweets by time.
Here is a link to our repo.
Word CloudsWord clouds are a simple way to visualize common words in a corpus of text.
We used wordcloud to create ours below.
Left: Last Semester’s Trump Word Cloud Right: Our Ulf Word CloudIt is clear from the word clouds that Trump’s tweets are more about abstract concepts like greatness, thanks, and shockingly, himself.
Whereas Ulf’s tweets primarily revolve around data (so surprising), people, and coding.
When creating our text cleaning functions, we limited mentions of other Twitter users, but they were actually quite a significant portion of the word clouds so we put them back in.
Trump often mentions himself and Barack Obama’s account whereas Ulf often tags suneman (Sune Lehmann, who runs the lab Ulf works in) and colleagues BenFMaier, and DirkBrockmann.
Looks like Ulf could improve his on his political mentions for his future career in politics.
Now that we got a sense of what words are used in their tweets, we became curious if there is there a difference in the sentiments between Trump and Ulf tweets?Sentiment AnalysisIn class, we learned about the Afinn library which allows us to create sentiment scores for text, and in this case, tweets.
The library has a score method that returns a positive score if the language of the tweet is overall positive and a negative score if the language of a tweet is overall negative.
A sentiment score at or around zero is considered neutral.
As we can see, the majority of both Trump and Ulf’s tweets are centered around zero.
However, Trump has a more positive average sentiment of 0.
172 compared to Ulf’s of 0.
We initially expected Ulf’s tweets to be more on the positive side, but perhaps Trump utilizes more powerful and polarizing positive language like mentioned about ‘make america great,’ which may push his average sentiment score to be more positive.
Show some excitement Ulf!Tweets by DayBeyond what Trump and Ulf say, it’s also interesting to analyze when they say it.
We know Trump is well-known for his weekend golf trips but does Ulf exhibit a similar work-hard, play-hard mentality?According to our analysis, it looks like Ulf just works hard (except on Sundays when he takes a bit of a break).
Have some fun Ulf!Tweets by TimeIn addition to the day of the week that Trump and Ulf are posting, we thought it would be interesting to analyze the time of day as we know Trump has a tendency to post late night twitter barrages.
Unlike Trump, the majority of Ulf’s tweeting comes in the morning and early afternoon hours — he must be following the Danish way and reduces his tweeting habits in the afternoon.
He also does not seem to be a night owl, with very few tweets coming in between the hours of 11 PM and 6 AM.
In fact, he might be a morning person with a tweet-peak between 7–10 AM.
Ultimately, we think that Trump might be better positioned for the political world than Ulf, but we have a few recommendations to improve your twitter game and be more like Trump:Use powerful phrases like “Make Denmark Great” to stoke the excitement of your supportersTake a break on the weekends to recharge and cheat at golfTweet later in the day — it shows that the grind doesn’t stopIt is important to note that the sample sizes between President Trump and Ulf differ by about 42x, so the conclusions from Ulf’s tweets must be statistically limited as for some elements in the histograms he has <10 tweets.
In terms of future exploration, it could be interesting to look more deeply into at Ulf’s tweets over time, although the data was fairly limited so we did not notice any meaningful trends besides a weird affinity for data and people.
We hope that you’ve enjoyed our exploration into Twitter — from sentiment analysis to an interactive global trend map to a comparison between Trump and our teacher, we’ve shown how powerful Twitter data can be!.