Locating Natural Disasters through Social Media Feeds with RChris KehlBlockedUnblockFollowFollowingMar 9Data is being harnessed from us every moment of every second of every day.
For instance right now as I am writing this article, I have Netflix streaming retrieving and collecting data from the shows that I watch.
I am glance at my bottle of Gold Peak diet tea that I bought at my local Kroger store.
In order to get my fuel points, I swiped a card that tracked the purchases that I made yesterday, which included the bottle of tea.
My Apple iphone is sitting next to me, waiting for me to pick it up to record my screen time.
My Trane HVAC thermostat is right above the chair I am sitting in, monitioring the comfort level of my home.
Even while I sleep, my fitbit and sleep number bed monitors my sleeping patterns.
Most Data being mined is for personal health and comfort, or a way for businesses to custom market their products to us.
Data can also be useful in monitoring and predicting natural disasters such as earthquakes, forest fires, severe weather.
On March 5, 2019, a 3.
4 magnitude earthquake was occured just 4 miles outside of Maynardville, Tennessee, an area that we usually don’t hear much about earthquakes.
This earthquake occured on a fault line known as Southern Appalachian Seismic Zone which stretches from northeastern Alabama into east Tennessee and southwest Virginia.
This occurance inspired me to tap into the Twitter API and pull down all the tweets from the past 9 days pertaining to tweets containing the #earthquake.
In this story I will provide the steps that I used to pull down the Twitter data using rStudio and the R-Programming language.
The first step if you don’t already have the necessary keys that allow you to pull down the Twitter data is to create an account.
To do this follow the steps below.
Log into the https://developer.
signup and create an account.
Once you have an account go to create an app.
Fill in all the details, create an app name, for the website URL I used my google account https://myaccount.
Once you accurately fill this out you will receive your keys.
You will need to provide an e-mail address so that Twitter can verify your address.
Notice this is where I received my key information that I used to pull data down from my API.
Once we get the keys we need, we can start writing our code in R.
## install rtweet from CRAN## install dev version of rtweet from githubdevtools::install_github("mkearney/rtweet")# Install the following packagesinstall.
packages(c("ROAuth", "plyr, stringr", "ggplot2","wordcloud"),dependencies=T)# hit enter and this is what you will see.
Restarting R session.
packages(c("ROAuth", "plyr, stringr", "ggplot2", "wordcloud"), dependencies = T)Error in install.
packages : Updating loaded packages> install.
packages(c("ROAuth", "plyr, stringr", "ggplot2", "wordcloud"), dependencies = T)Warning in install.
packages : package ‘plyr, stringr’ is not available (for R version 3.
1)trying URL 'https://cran.
tgz'Content type 'application/x-gzip' length 76740 bytes (74 KB)==================================================downloaded 74 KBtrying URL 'https://cran.
tgz'Content type 'application/x-gzip' length 3622365 bytes (3.
5 MB)==================================================downloaded 3.
5 MBtrying URL 'https://cran.
tgz'Content type 'application/x-gzip' length 229178 bytes (223 KB)==================================================downloaded 223 KBNow we will load our libraries.
Note when I load my libraries, I place my curser after each library I want loaded and hit enter.
## load the package librarieslibrary(rtweet)library(ROAuth)library(plyr)library(ggplot2)library(wordcloud)library(tm)# hit enter and you should see thisThe downloaded binary packages are in /var/folders/fc/s97j11jn495g6l0k1ds7x2540000gn/T//RtmpbxuZuS/downloaded_packages> ## load the package libraries> library(rtweet)> library(ROAuth)> library(plyr)> library(stringr)Warning message:package ‘stringr’ was built under R version 3.
2 > library(ggplot2)> library(wordcloud)Loading required package: RColorBrewer> library(tm)Loading required package: NLPAttaching package: ‘NLP’The following object is masked from ‘package:ggplot2’:annotateNext I am going to install devtools package if it’s not already installed.
## install devtools package if it's not alreadyif (!requireNamespace("devtools", quietly = TRUE)) install.
packages("devtools")# You should get something like this belowinstalling the source package ‘devtools’trying URL 'https://cran.
gz'Content type 'application/x-gzip' length 388953 bytes (379 KB)==================================================downloaded 379 KB* installing *source* package ‘devtools’ .
** package ‘devtools’ successfully unpacked and MD5 sums checked** R** inst** byte-compile and prepare package for lazy loading** help*** installing help indices*** copying figures** building package indices** installing vignettes** testing if installed package can be loaded* DONE (devtools)The downloaded source packages are in ‘/private/var/folders/fc/s97j11jn495g6l0k1ds7x2540000gn/T/RtmpbxuZuS/downloaded_packages’Ok time to retrieve the Twitter API keys.
We are going to place them in our R code to access and retrieve data from the API.
## access token method: create token and save it as an environment # Note put in your API, not the info below, it will not work.
create_token( app = "chris_kehl", consumer_key = "dvkGkS6njqwiethinrg74bdt", consumer_secret = "av6ZVYgrJk8dHPqTrnTwirng", access_token = "12538373736-sEAAMIHF069nldOEAs3CY87eCEpvt8a4RCD9m", access_secret = "J5TEGT6rDtenrye9d2O6ZifSjwiAECRp8o9R5x")# hit enter and you will get this> ## access token method: create token and save it as an environment variable> create_token(+ app = "chris_kehl",+ consumer_key = "dvkGkS6njqwiethinrg74bdt",+ consumer_secret = "av6ZVYgrJk8dHPqTrnTwirng",+ access_token = "12538373736-sEAAMIHF069nldOEAs3CY87eCEpvt8a4RCD9m",+ access_secret = "J5TEGT6rDtenrye9d2O6ZifSjwiAECRp8o9R5x")<Token><oauth_endpoint> request: https://api.
com/oauth/request_token authorize: https://api.
com/oauth/authenticate access: https://api.
com/oauth/access_token<oauth_app> chris_kehl key: dvkGkS6njqwiethinrg74bdt secret: <hidden><credentials> oauth_token, oauth_token_secretTime to retrieve the twitter data containing the #earthquake info.
## search for 18000 tweets using the rstats hashtageq.
list <- search_tweets( "#earthquake", n = 18000, include_rts = FALSE)# output will be something like this> ## search for 18000 tweets using the rstats hashtag> eq.
list <- search_tweets(+ "#earthquake", n = 18000, include_rts = FALSE+ )Downloading [===================================>—–] 88%>We’ll format the data we just retrieved in order to perform our analysis.
# Create a CorpuseqCorpus <- Corpus(VectorSource(eq.
list$text))# set up stemmingeqCorpus <- tm_map(eqCorpus, stemDocument)# output will be something like this> # Create a Corpus> eqCorpus <- Corpus(VectorSource(eq.
list$text))> # set up stemming> eqCorpus <- tm_map(eqCorpus, stemDocument)Warning message:In tm_map.
SimpleCorpus(eqCorpus, stemDocument) : transformation drops documentsLets now plot the tweets that were received with in the past nine days.
## plot time series of tweetsts_plot(eq.
list, "3 hours") + ggplot2::theme_minimal() + ggplot2::theme(plot.
title = ggplot2::element_text(face = "bold")) + ggplot2::labs( x = NULL, y = NULL, title = "Frequency of #earthquakes Twitter statuses from past 9 days", subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals", caption = ".Source: Data collected from Twitter's REST API via rtweet" )# Output will look like this > ## plot time series of tweets> ts_plot(eq.
list, "3 hours") ++ ggplot2::theme_minimal() ++ ggplot2::theme(plot.
title = ggplot2::element_text(face = "bold")) ++ ggplot2::labs(+ x = NULL, y = NULL,+ title = "Frequency of #earthquakes Twitter statuses from past 9 days",+ subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals",+ caption = ".Source: Data collected from Twitter's REST API via rtweet"+ )Our plot that we will analyze is:We can analyze our plot of the frequencies to see what type of activity we have.
As noted Tennessee had an earthquake on 5 March 2019, we spiked a little, but not as much as on March 1.
We can search through our data that was retrieved and see that some earthquake activity occured with magnatudes of 5.
Scrolling through the data we can see earthquakes stretching from Argentina all the way up to Alaska.
We will now plot the longitude and latitude of the most active tweets.
## create lat/lng variables using all available tweet and profile geo-location dataeq <- lat_lng(eq.
list)## plot state boundariespar(mar = c(0, 0, 0, 0))maps::map("world", lwd = .
90)## plot lat and lng points onto world mapwith(eq, points(lng, lat, pch = 20, cex = .
75, col = rgb(0, .
75)))We run our code and see our world plot.
From our plot we see activity from California and from Tennessee.
This is where the majority of the twitter feeds are coming from.
We can look at our data we retrieved to analyze the severity of the earthquakes.
In SummaryWe can use data to analyze anything from who is streaming what on Netflix to how many hours a day our iphone users are looking at their screens.
But what if we can analyze tweets to save lives, or provide the necessary resources to those in need.
During this story, we analyzed earthquake activity by plotting the frequency of tweets and plotting points on the world map.
We can anlayse the need for further investigation by frequency of tweets as shown by our plots.
We can use our sample code to analyze other disasters such as, the Syrian refugee crisis, the public sentiment pertaining to the Russian invasion of Crimea, and monitor situations occurring during severe weather.
The possibilities are endless, just change the #earthquake to #HELP!, #Crimea, #nextdisaster.
Maybe we can use this data to save a life, or create change.
Follow me at https://www.
kehl for part II of this story.