R for Product Analytics — How to derive insights from user behavior

R for Product Analytics — How to derive insights from user behaviorBence TóthBlockedUnblockFollowFollowingJun 6This analysis is based on the concepts of the book Lean Analytics: Use Data Analytics to Build a Better Startup Faster by Alistair Croll and Benjamin Yoskovitz.

It aims to carry out a simple data exploration regarding the topics of Acquisition, Activity and Retention, three key lifecycle stages of a product, according to the book.

About the dataThe example is based on a real Software as a Service, subscription based product.

There are two datasets available:registrations.

csv: unique users with basic demographicsactivity.

csv: which users have been active in which monthMore information about the datasets can be accessed on the project’s GitHub repository.

AcquisitionAcquisition is a stage where you should generate attention through a variety of means, both organic and inorganic.

Relevant metrics to assess acquisition include traffic, mentions, cost per click, search results, cost of acquisition, open rate etc.

As a first step in the analysis, let’s see the number of registrations in each month:Figure 1: number of registrations per monthWe can see that year 2 has a lower number of registrations overall (21831 vs.


In the first year, most registrations occurred during the autumn/early winter period (September — November).

We have no information on when this product was released to the market, the lower periods in the first half of year 1 might resemble the early adaptors, and the spike in the second part could be a successful marketing campaign.

After some lows during the Christmas holidays, there is a second peak until May of year 2, then a sudden, 36.

87% drop in registrations.

This might be a failed product update, decrease in market reach or other circumstances.

To dig deeper into the data, we can see the year-over-year growth of registrations:Table 1: Year on Year growth of registrationsThe next plot shows the growth rates for each month, together with the Naive forecast of Month 22 (October of the second year):Figure 2: Year-on-year growth & Naive ForecastThe year-over-year data seems to be decreasing, but keeping higher number of registrations in year 2 up until September.

The Naive forecast for October is 0.


We can investigate if there are any differences in registration numbers based on geographic regions:Figure 3: Region differences in year-on-year growthComparing the regions, there seems to be a difference.

America underperforms both EMEA and the rest of the world, except in the beginning of year 1.

All show a decreasing trend in registration numbers, but the ROW region seems to produce the highest growth.

It is likely to drive future growth in registrations, while EMEA, and especially America might have different interest in the product, or they were less affected by the marketing campaigns.

ActivityActivation is turning the resulting drive-by visitors into users who are somehow enrolled.

It can be done by tracking a specific activity milestone, such as enrollments, signups, completed onboarding process, used the service at least once, subscriptions etc.

The plot below shows the number of active users in each month:Figure 4: number of active users per monthThe number of users increased until the summer of the first year, then it dropped, and only returned to the May level in September.

It peaked during November and decreased during Christmas and winter.

The second year had its highest number of active users between March and May, and again decreased for the summer, showing a seasonality trait.

As we seen, the region of America struggles with new registrations.

Let’s see the percentage of America among active users in each month:Figure 5: number of active users per monthWe can see that the number of users from America is on average between 17–22%.

The company has to think about investing in other ways to drive that market, or focus on the rest of their clients which seem to have better registration and activity rates.

It is worth it to also take a look at the users’ activity patterns.

The users can be classified as New (registered that month), Retained (was active the previous month as well) or Resurrected (was inactive the previous month and not New).

To illustrate this, take a look at a randomly chosen user’s activity history:Table 2: User activity historyThis user registered in July, kept using the service in August, then did not perform any activity in September.

He returned in November and was active in December as well.

He did not use the product in the second year.

Let’s now look at the number of Retained active users in each month:Figure 6: number of Retained users per monthRetentionThe main task during the Retention phase is to convince users to come back repeatedly, exhibiting sticky behavior.

It can be tracked by metrics such as engagement, time since last visit, daily and monthly active use, churns etc.

In the case of our example, we can calculate the second month retention rate (registered users in Month 1 who have been active in Month 2).

The second month retention rate is 46.


It can be considered pretty good, it usually ranges from 20% to 60%.

Let’s see the same metric for year two (between Month 13 and 14).


84% of users who registered in January, second year were active in February.

It is still an acceptable value, but what could be behind the 10% drop since last year?.It can be because the majority of users who are interested in the product have already signed up, so the pool of potential interested users for new registrations became smaller.

This way, the users who joined in the second year were not that engaged in the product and decided to opt out of it in the upcoming month with a higher percentage.

We can also calculate the second month retention rate based on the operating systems of the users:Table 3: Second month retention rate by Operating SystemFor Unknown operating systems, the retention rate is way lower (38%) than for Mac users (61.


One action is to investigate the data collection process and define which operating systems these users are actually using.

There is a significant, 20% difference between Mac and Linux users as well.

This might signal that the product is not that optimized for Linux as it is for Mac.

ConclusionWith this basic introduction to different product analytics practices, we could see how little effort and some programming knowledge could lead to interesting hypotheses about user behavior.

Methods such as A/B testing could be applied to test these hypotheses, for example: how optimizing the Linux version of the product will influence retention rates, which regions should be targeted with specific advertising campaigns etc.

This project was done as a requirement for the Mastering Product Analytics course at Central European University in Hungary.

The R code along with the dataset can be found in my ceu_product-analytics repository on GitHub.


. More details

Leave a Reply