So, to overcome my FOMO, I decided to collect 3,300 tweets about the event using the official hashtag #WelcometotheForest.
This blog post presents the findings of my analysis.
Please scroll down to view my analysis via data visualizations!Data and MethodsThe official hashtag promoted by the organisers was #WelcometotheForest.
At the time of the event, I used the Twitter API to collect 3,300 tweets which contained this hashtag.
It’s important to note that I only collected tweets that contained #WelcometotheForest; there were, of course, many tweets about Welcome to the Forest that did not contain the hashtag.
Having collected the tweets, I performed a range of advanced statistical and machine learning techniques — notably, natural language processing and computer vision — to help me understand the event in a bit more detail.
Specifically, I used the Google Cloud Natural Language API to calculate the sentiment for each tweet, then I used the gensim library’s Word2Vec model to perform semantic analysis from the entire corpus of the tweets, additionally, I also used Google Cloud’s Vision API to detect features and labels about each image uploaded online, and finally, I used a pre-trained convolutional neural network to undertake feature extraction and reverse image search to cluster those images based on visual similarity.
What a mouthful!Analyzing the tweetsThe meat of my analysis draws insights from the 3,300 tweets I collected via the Twitter API.
Below, I report on the following five metrics:The number of tweets per day and by the hour;The average sentiment of the tweets per day;The top 10 words and hashtags from the tweets;The top words associated with “WelcometotheForest” based on semantic learning;The most popular images of Welcome to the Forest based on visual similarity.
Tweet frequencyThe bar graph below shows all tweets from the Wednesday before the event to the Wednesday after the event, 9th to 16th January.
The most popular day at Welcome to the Forest was Friday 11 January with 932 tweets using the hashtag #WelcometotheForest.
However, a significant proportion (66% on average across all days) of the tweets were retweets (RT), so there were actually only 364 distinct tweets on Friday 11 January.
Bar chart showing the number of tweets by day during the festivalThe busiest time of the day (averaged across all days of the event) was in the late evening between 7 pm and 10 pm — 8 pm was the busiest hour with 324 tweets in total (185 without retweets).
Bar chart showing the average tweets per hourSentiment AnalysisIn order to gauge whether the party was good or bad, I performed sentiment analysis.
The sentiment for each tweet was calculated using Google’s Cloud NLP API.
The bar graph below shows the average sentiment of tweets per day, with -1 being very negative sentiment and +1 being very positive sentiment.
We see that Welcome to the Forest started with relatively high sentiment, dipped on Thursday 10th January until climbing all the way up-hill to a very strong sentiment of 0.
Overall, Welcome to the Forest had an average sentiment across all days at 0.
57, which is very good!.Looks like I missed out on a lot of fun…Line chart showing the average sentiment of the tweets per dayText Frequency analysis & Top hashtagsThe bar graphs below show the number of times a word and also a hashtag appeared throughout the body of all the tweets, left and right, respectively.
It’s worth noting that because hashtags also appear in the body of the tweet, they become a confounding variable when calculating the frequency of words.
Therefore, I took steps to remove the hashtag count from the word count.
Bar graphs showing the count of words and hashtags appearing in all the tweetsPredictably, the hashtag #welcometotheforest appeared the most, however, interestingly, the hashtags #wfculture19 and #mylocalculture also appeared in abundance.
Even after discounting the hashtag count, the word “wfculture19” appeared the most, followed by “culture”, but “wfcouncil” and “erlandcooper” also got some shoutouts!I did a quick Google search on the back of this information and I found that #wfculture19 and #mylocalculture are official hashtags being promoted by Waltham Forest Council’s and the Mayor of London’s Culture team’s twitter accounts.
However, the results above are not very useful in telling us what people thought about the event.
In the subset below, we only find nouns rather than adjectives.
Accordingly, I used other machine learning techniques to try and unearth some adjectives.
Semantic meaningIn order to get a more granular understanding of the text from the tweets, I undertook semantic analysis using natural language processing and machine learning.
Word2Vec is a neural language machine learning model that takes a large corpus of text — in this case, the text from the 3,300 tweet — as input and outputs a vector space, typically of several hundred dimensions, with each unique word corresponding to a vector in space — a word embedding.
Specifically, objects that are closer together in that space mean that they are similar.
“Nearest neighbours” are the handful of words from the Word2Vec model that are most similar to “WelcometotheForest ” based on a cosine metric similarity score.
The scatter plot below shows the nearest neighbours for “WelcometotheForest”.
Importantly, the words “fantastic”, “fun”, “enjoy” and “proud” are close by, and also “children”, “families” and “involved”.
This is a very positive outcome!.It statistically demonstrates that these words most represent how people felt when they tweeted about Welcome to the Forest.
It seems like it was a very enjoyable and inclusive event!PCA output of the nearest neighbours of #WelcometotheForest from the Word2Vec modelMost popular artworksAfter retrieving useful textual information from the tweets, I finally turned to the image data.
732 tweets outs of the 3,300 tweets in total had images attached to them.
Using those images, I programmed the computer to learn visual similarities between them.
A technique called feature extraction and reverse image search did exactly this.
Using a Keras VGG16 neural network model running on a TensorFlow backend, I first extracted a feature for each image in the dataset.
A feature is a 4096-element array of numbers for each image.
Our expectations are that “the feature forms a very good representation of the image such that similar images will have a similar feature” (Gene Kogan, 2018).
Then the feature’s dimensions were reduced using principal component analysis (PCA) to create an embedding, and the distance — cosine distance — of one image’s PCA embedding to another was computed.
Now that I had an embedding for each image in a vector space, I used a popular machine learning visualization algorithm called t-SNE to cluster and then visualise that vector space in 2-dimensions.
“The aim of tSNE is to cluster small “neighbourhoods” of similar data points while also reducing the overall dimensionality of the data so it is more easily visualized” (Google AI Blog, 2018).
The clustering of images of Welcome to the Forest 2019.
Source: TwitterThe image above shows really good clustering for Nest by Marshmallow Laser Feast in the top right-hand corner and Into The Forest by Greenaway & Greenaway in the bottom right-hand corner.
ConclusionSo there you have it!.Although sitting at my laptop doing research about the party did not make up for all the fun that I missed out on, I did learn a lot about the event!.There is still huge potential to dig into the tweets further.
Congratulations on such a great event and achievement and all the best for the rest of the year as London’s Borough of Culture!Thank you for reading!VishalVishal is a Research Student at The Bartlett at UCL in London.
He is interested in the economic and social impact of culture in cities.
You can get in touch with him on Twitter or LinkedIn.
See more of Vishal’s work on Instagram or on his website.
Mentions: Theo Blackwell, Create Associates, Smart London, Museums London, The Guardian.