Insights from social media data miningCharles Novaes de SantanaBlockedUnblockFollowFollowingJan 14Social media is a incredible source of data today.
Besides the data that is accessible only to the social media companies themselves (data that is key for most of social media companies business models) there is also a lot of data freely available to the general public via APIs (Application Programming Interfaces) implemented in different programming languages.
One of the social media I like to gather data the most is Twitter, because in my opinion its API gives access to more interesting data than most of the other social media APIs (like the location of the users, and the IDs of the retweets).
Below I will describe a simple example in which I use data (text) mining, data preparation, and data visualization to get insights about some topic of interest discussed in Twitter.
I used R as my main programming language.
I used the R implementation of the Twitter API to get the data; I used the R library dplyr to manipulate and clean the data; and I used the R library ggplot2 to plot the figures.
I searched for the 1000 most popular posts shared between 01.
2018 and 13.
2019 that included in the text the following terms: “job + datascience”, ”job + machinelearning”,”job + datascience + R”,”job + datascience + python”,”job + machinelearning + R”,”job + machinelearning + python".
My plan was the following: to have an idea about the job offers related to data science, machine learning and the programming languages R and Python shared in Twitter.
What are the most shared job offers?.Who are the most popular users sharing those tweets?.Let’s find it below:Most popular users sharing job offers about “data science” with Python and with RFigure 1.
Data Science job announcements in Twitter related to “R” (red) and to “Python” (blue)Most popular users sharing job offers about “machine learning” with Python and with RFigure 2.
Machine Learning job announcements in Twitter related to “R” (red) and to “Python” (blue)Most popular users sharing job offers about “R”Figure 3.
R job announcements in Twitter related to “datascience” (red) and to “machinelearning” (blue)Most popular users sharing job offers about “Python”Figure 4.
Python job announcements in Twitter related to “datascience” (red) and to “machinelearning” (blue)InsightsThe first thing that came to my mind after seeing the figures above was the fact that there were really many more Job advertisements talking about “Python” than ads about “R”, for both, “datascience” and “machinelearning” ads.
We can see it clearly in figures 1 and 2 because there are many more blue bars than red ones.
Another interesting finding is that R appears in more advertisements for data science than for machine learning.
We can see it in figure 3, where the red bars are bigger than the blue ones.
Although data science and machine learning usually go together in many job offers, there are important differences that explain why R seems to be a “data science niche” language.
R was initially created as a programming language for statistical computing.
Data Science is about data analysis, so it is natural that R is a popular programming language in this field.
Machine learning is about Artificial Intelligence, it is something more than data analysis: it is also about algorithms.
Python is a general use programming language very popular among computer scientists.
It has a diverse number of libraries for many applications, including statistical computing and artificial intelligence.
This characteristic explains a bit why in figure 4 the blue bars are as big as the red ones: there are as many data science job offers asking for Python as machine learning ones.
I know it is not something new, as Python is making a great job in most of the rankings about programming languages popularity, but it is really cool to find this insight also through this tiny twitter data mining experiment.