Calculating New and Returning Customers in RHamza RafiqBlockedUnblockFollowFollowingMay 22I recently came across an issue in which I wanted to calculate new and returning customers.
So, I naturally googled about it and was surprised to see that I could not find any solutions on it in R.
This has generally been the issue with most blogs/tutorials on R, that they are not very business orientated.
Since I couldn't find a solution online, I started working on it myself.
I was able to figure the solution out pretty quickly in Excel and Power BI using DAX and Power Pivot but I wanted to do it in R.
Actually, I am a bit embarrassed to admit that the solution is pretty straightforward.
I would expect you to have some basic understanding of the Tidyverse set of packages in R, mainly dplyr to get an understanding of this tutorial.
Before moving on, we first need to decide on how we define a customer as “New” or “Returning”New CustomerAny customer who has not worked with us in the past will be classified as a “New Customer”Returning CustomerAny customer who has worked with us in the past and comes back to work with us again in the current period is considered a “Returning Customer”Loading Librarieslibrary(tidyverse)# For data manipulationlibrary(readxl) # Read in Excel Filelibrary(lubridate)# Working with DatesLoading and Viewing DataWe use the following code to read in the sales data and take a glimpse of it to check the data types.
sales_data <- read_xlsx(path="C:UsersACERDownloadssale_data.
xlsx")glimpse(sales_data)We get the following output.
Sale Date is being shown as a date time variable while Customer ID is recognized as a numeric type.
We will now change the “Sale Date” and “Customer ID” columns to date and character type respectively using the mutate function from dplyr.
sales_data <- sales_data%>% mutate(`Sale Date` = dmy(`Sale Date`), `Customer ID`=as.
character(`Customer ID`))Now that our data is ready, the first thing we will do is to calculate the date of first engagement at every row against each customer.
This will determine whether they were a new customer or a returning customer at that point in time.
sales_data <- sales_data %>%group_by(`Customer ID`)%>%mutate(date_of_first_engagement=min(`Sale Date`))%>%ungroup()Since we want to calculate the date of first engagement for each customer at every single row, we will first group the data by “Customer ID” and then use the mutate function to calculate the first time they engaged with us using min function on “Sale Date”.
This is basically a very useful way of calculating conditional calculated columns in R.
Once the calculation is done, we will ungroup it because we do not want any further calculations based on “Customer ID”.
Below is how our dataset will look after the calculations are doneIn row 10, Customer “39099” worked with us on 17th September 2018 but we can see that their first date of engagement with us was on 9th August 2018 which means that on 17th September, the customer is a “Returning Customer” while on 9th August, they were a “New Customer”.
Using this logic, we will now create a new column — “Customer Status”.
We will be using the case_when function to categorize every single customer at each point in time based on the difference of “Sale Date” and “date_of_first_engagement”sales_data <- sales_data%>% mutate(Customer_Status = case_when(`Date Interviewed`>date_of_first_engagement ~ "Returning",`Date Interviewed` == date_of_first_engagement ~ "New", TRUE ~ "Other"))Basically what we are doing is calculating if the “Sale date” of the customer is greater than the customer’s first date of engagement then it means that they have worked with us before and thus are a “Returning Customer” and if their “Sale date” is equal to their first date of engagement then it means that it was the first time they worked with us and thus are a “New customer” at that point in time.
Attached below is how the dataset will look like.
Our data is finally ready!.and now we can do some interesting analysis on it.
I would like to see how many “New” and “Returning” Customers did we have each month.
We will be using group by and summarise functions to get this resultNew_and_Returning_Customers <- sales_data%>%group_by(floor_date(`Sale Date`,unit = 'month'))%>%summarise(New_Customers = n_distinct(`Customer ID`[Customer_Status=="New"]),Returning_Customers= n_distinct(`Customer ID`[Customer_Status=="Returning"]))Since I wanted to group by Month, I used the floor_date function from the lubridate package to round the ‘Sale Date’ to the nearest month and then used conditional distinct count to to get a unique count of New and Returning customers each month.
We can use ggplot2 package to make a simple line chart of our dataConclusionNow that we have an understanding on how to calculate New and Returning customers, we can do all sorts of analysis on them.
The Purpose of this tutorial was to give you an understanding on how to make this calculation in a relatively simple and clean way.
.. More details