Google Trends: How to acquire daily data for broad time framesFranz B.
BlockedUnblockFollowFollowingOct 30, 20181.
Google Trends dataGoogle allows all users to access and process anonymised data on relative search volume behaviour with Google Trends.
This data can be either accessed online at Google Trends or via a Pseudo-API in R/Python.
The data provided by Google does not necessarily represent daily search volumes for certain keywords but rather search trends and relative search volume/interest over time.
This data can be filtered by various criteria such as geographical location (e.
search behaviour in certain countries/continents), certain subject areas (e.
sports, media and health) and different time frames (e.
months, years and individual time ranges).
A broad range of further options and filters is available.
Please note that this article does not discuss the implementation and set-up of the gtrends package but rather how to obtain daily data for broad time frames that usually return monthly data.
Interpretation of Google Trends dataGoogle provides the relative search volume for a keyword indexed between zero and 100.
More precisely, zero indicates the lowest relative search interest for the given keyword whereas 100 indicates the date with the maximum search interest within the selected time range.
It is important to point out that the data is always indexed for the selected period of time, hence the index values may change for equivalent keywords depending on varying time frames.
In addition to that, depending on the selected time frame, Google changes the frequency of the provided data points.
For example, a data request for a certain keyword for the entire year 2017 will result in weekly data points (hence users obtain approximately 52 data points).
A different time frame for the same keyword will result in significantly different (absolute) data points.
For example, a data request for the first month (January) of 2017 will result in daily data points (hence users obtain 31 data points) scaled between 0 and 100.
Nevertheless, the obtained data remains approximately comparable for all periods disclosed via the relative changes between data points (even though the absolute values provided by Google differ across miscellaneous time frames).
This article examines the time frame from January 2004 until May 2018 in the data analysis.
A Google Trends request for this time frame will result in monthly data for each month from 2004 until 2018.
However, as previously mentioned, we are interested in examining daily data.
The data were obtained via a Google Trends API implemented into the statistical software R.
You can have a detailed look at the gtrends package for R here.
In the course of data acquisition, a string array “t” was constructed that contains all days from January 2004 to May 2018 divided up into every single month within that time range:t = c(“all”, “2004–01–01 2004–01–31”, “2004–02–01 2004–02–28”, “2004–03–01 2004–03–30”, […], “2018–05–01 2018–05–31”) (1)Equation (1) states that the data is requested once for the entire time frame (string element “all” ) and subsequently for every single month between January 2004 and May 2018.
As explained at the beginning of this subsection, the data points for the overall time frame (“all”) will be provided in monthly frequency, whereas the individual data requests for each month are provided in daily frequency.
The daily data for each month is then merged into a single time frame for each of the preselected keywords.
To make (valid) inter-monthly comparisons between data points from different months, the overall search interest (“all”) is used to adjust the data.
The following Table 3.
1 illustrates this adjustment process for the keyword GDP:Table 1: Data Processing ExampleTable 1: This example illustrates the first part of data processing using the keyword GDP: The data is obtained for each individual month and respectively indexed between 0 and 100 for each month.
This circumstance does not allow for intra-month comparisons immediately meaning that one cannot conclude that from 01.
2004 (relative search interest at 35) to 01.
2004 (relative search interest at 55) the interest increased by approximately 57%.
To make valid comparisons, the daily data for each month needs to be multiplied by the respective monthly search interest weight with respect to all months between 2004 and 2018.
The column “Adjusted Data” shows the calculation and results that allow inter-month comparisons.
For the previous example, this means that the relative search interest from 01.
2004 (now at 30.
8) actually increased by approximately 78.
5% to 01.
2004 (remains at 55).
As shown above, the presented approach is necessary to obtain comparable daily data for the overall time frame from 2004 until 2018.
Figure 1: Visualization of the data transformation processFigure 1 visualizes the above-described approach in three graphs.
Graph 1 visualizes the equivalent of the merged daily data (see “Daily Data (merged)”) in Table 1 for the keyword crisis.
Graph 2 displays monthly data (see “Monthly data “all” in Table 1) and Graph 3 shows the adjusted data (see “Adjusted Data” in Table 1).
The graphical visualization in Figure 1 illustrates the necessity to adjust the initial data set.
In addition, the graphical correspondence between Graph 2 and 3 suggests that inter-month comparisons based on Graph 3 are valid on a daily basis.
ConclusionWe have seen in this example how to transform Google Trends data for a broad time range to daily values.
As Google does not provide daily but monthly data for long time periods, manual processing is required.
It can be seen that the processed daily data is approximately comparable to the monthly data as seen in Figure 1.
.. More details