Implementing a Profitable Promotional Strategy for Starbucks with Machine Learning (Part 1)Josh Xin Jie LeeBlockedUnblockFollowFollowingJan 8In this series, we will design a promotional strategy for Starbucks and walk through the entire process from data pre-processing to modelling.
This is my solution for the Udacity Data Scientist Nanodegree Capstone project.
In the first part of the series, we will cover the introduction of the project, description of the datasets, types of promotions available, imputing missing values, data pre-processing and exploratory data analysis.
Link to part 2 of this article.
The code accompanying this article can be found here.
IntroductionIn this series of articles, we will explore data from Starbucks rewards mobile app and employ machine learning to design a promotional strategy.
We will also walk through the entire data pre-processing workflow and devise a metric to test our promotional strategy.
The question that we will like to answer at the end of this article will be:Can we increase Starbucks’ profits by adopting a more selective promotional strategy?As part of the Udacity Data Scientists Nanodegree Capstone project, Starbucks has kindly provided a simulated dataset that mimics customer behavior on their rewards mobile app.
The dataset can also be found on my GitHub repository.
Here is the situation: once every few days, Starbucks will send out promotions to customers on the mobile app.
These can be discount offers, buy-one-get-one-free offers (BOGO), or informational offers.
There is often a cost associated with the promotion.
In our case, a buy-one-get-one-free or a discount promotion can result in “lost profits” for the firm, since the firm is sacrificing some profits to give the customers monetary incentives (free product or discount).
If the customers are already willing to purchase products from Starbucks without being given any promotions, then we should abstain from giving this group of customers any promotions.
In addition, there exists a group of individuals known as “sleeping dog”.
“Sleeping dogs” are individuals who buy your product, but will stop doing so if they are included in your marketing campaign.
Hence, it is not a sound business strategy to send promotions to every individual.
Ideally, we will want to send the promotions to individuals who are inclined to make purchases only when they are presented with offers, and we also want to avoid sending them to “sleeping dogs”.
Our objective will be to formulate a promotional strategy that can identify individuals whom we should send promotions with the aim of maximizing profits for the firm.
To help us tackle this problem, we will be utilizing uplift models.
Uplift models allow us to model the incremental impact of a treatment (this will be the promotion in our case) on a customer’s purchase behavior.
If you are interested in learning more about uplift models, check out this article.
While uplift models can be really beneficial, they can sometimes be tricky to implement.
Notably, the biggest challenge is finding the optimal method to model the promotion’s incremental impact on the individual’s response.
Nonetheless, these models offer us the best chance of improving the promotions’ effectiveness.
So, we shall give it a try!II.
DatasetThere are 3 data files provided by Starbucks for this project.
The data files and their schema are listed below:portfolio.
jsonid (string) — offer idoffer_type (string) — type of offer ie BOGO, discount, informationaldifficulty (int) — minimum required of spending to complete an offerreward (int) — reward given for completing an offerduration (int) — the period of validity for the offerchannels (list of strings) — the medium through which the offer was sent throughThe portfolio dataset contains information about 10 varieties of promotions that are available through the app.
There are 3 main families of promotions available: informational, discount and buy-one-get-one-free (BOGO) offers.
In the dataset, the term offer type was used to refer to an umbrella of similar promotions.
For example it could refer to the family of discount promotions.
I find this term confusing, so I will refer to the umbrella of similar promotions as simply the family type of offers.
I will use the term offer type or promotion type to refer to each of the 10 varieties of promotion.
Informational offers have no monetary rewards attached to them.
These promotions are primarily advertisements for drinks.
Discount offers usually provide a small amount of monetary reward, provided the customer spend a certain amount of money within the duration of the offer.
The amount needed to be spent to unlock the reward will be referred to as the offer’s difficulty.
BOGO are similar to discount offers, except they provide monetary rewards equivalent to the offers’ difficulties.
The details of the offer can be viewed below:Portfolio dataset with offer id re-encodedNote that I re-encoded the original offer id from their original hash values to integers (0–9) to simplify the work process.
jsonage (int) — age of the customerbecame_member_on (int) — data when customer created an app accountgender (str) — gender of the customerid (str) — customer idincome (float) — customer’s incomeSnapshot of original profile datasetThe profile dataset contains demographics information about the customers.
There are only 4 demographic attributes that we can work with: age, income, gender and membership start date.
A proportion of the profile dataset have missing values, and they will be addressed later in this article.
jsonevent (str) — record description (ie transaction, offer received, offer viewed, etc)person (str) — customer idtime (int) — time in hours.
The data begins at time t=0value (dict of strings) — either an offer id or transaction amount depending on the record.
Snapshot of original transcript datasetThe transcript dataset records timestamps of when purchases were made on the app as well as timestamps of when offers were received, viewed or completed.
There are a couple of details about the dataset that should be highlighted before we move on.
Customers need not spend all of the money in a single transaction to activate the offer.
As long as they reach the required amount within the duration of the offer, they will receive the rewards of the offer regardless of the number of transactions they made.
Offers can be recorded as completed even if the user did not actually view the offer but spend the required amount of money.
Offers can also be recorded as completed even after the offer has expired.
For instance, let us consider an offer that is expiring at time 7.
If the customer spends the required amount of money at a later time, perhaps time 20, the offer will still be recorded as completed.
In reality, this offer is not “completed” and the customer will not enjoy the rewards of the promotion.
Hence, we need to account for this situation when working with the data.
There were no unique serial number assigned to each offer, person and time combination.
This can cause confusion regarding the interpretation of the transcript dataset.
For example, a customer might receive a $10 BOGO discount at time 7, and another identical offer at time 31.
The app might then record an offer completion at time 52.
However, we would not know which is the relevant offer that was completed at time 52 due to issue number 3.
An example illustrating problems 3 and 4.
Note the events at time 0, 6, 132.
This offer has a validity of 7 days and should have been expired at time 7.
Nonetheless, it was “completed” 132 days after it was sent.
These peculiarities meant that recapturing the exact outcomes of every offer might be impossible, although we should be able to recover most of it.
Available OffersThese are the offers available through the app:Offer id Number: Offer Family Type, Validity, Difficulty, RewardOffer id 0: Discount 10/20/5Offer id 1: Discount 7/7/3Offer id 2: Discount 7/10/2Offer id 3: Informational 4/0/0Offer id 4: BOGO 5/10/10Offer id 5: Informational 3/0/0Offer id 6: BOGO 7/5/5Offer id 7: BOGO 7/10/10Offer id 8: BOGO 5/5/5Offer id 9: Discount 10/10/2I will use the same offer reference system throughout the article.
Note: I will use Offer id 10 to represent non-offer situations.
This will become clear in a while.
Predicting Missing ValuesApproximately 12.
8% of the profile dataset contains missing values for both gender and income.
Coincidentally, individuals are either missing both age and income data, or are missing none at all.
Furthermore, individuals with missing gender and income information all share identical ages of 118.
It is probable that these individuals are not 118 years old in reality.
Distribution of Age.
Note the outlier group at age 118.
The plausible explanation is that 118 is the app’s default input age when dealing with missing demographics data.
Since age, income, and gender are the only demographics data available (other than membership start date), it is important to address these missing data.
One solution will be to remove the individuals with missing data.
However that will also mean losing a significant portion of the data.
Due to the size of the group, it is probable that these individuals have diverse demographics attributes.
Hence I decided to use a machine-learning based approach to predict the missing values, rather than imputing the missing values with a single value, such as mean.
Feature EngineeringSince membership start date is the only demographic information that can be used for our machine learning models, my hope is that differences in transactional behavior exist between individuals of different genders, income levels and age groups.
Statistics for transaction behaviors would be aggregated on a individual basis.
For each individual, I keep track of :number of offers receivednumber of offers successfully completednumber of offers that were tried but not completedpercentage of offers successfully completedpercentage of offers triedtotal spending for offertotal number of transactions made for offersaverage spending per transaction for offersThese numbers were aggregated on a cumulative basis (all offers plus no offer), each offer type (id 0–9 plus id 10 representing no offer) and each offer family type (BOGO, discount, informational).
Note: The definitions for successful/tried offers can be found in the subsection Define Successful/Tried/Failed Offers under Section V.
Data Preprocessing: Generating Monthly DataAdditional ratios were also computed, such as:ratio of spending for offer type and offer family type over total spendingratio of number of transactions for offer type and offer family type over total number of transactionsratio of number of offer type and offer family type received over total number of offers receivedDue to the considerable number of features created, I will abstain from listing all of them in this article.
For more details, refer to the code in the file input_missing_data.
ipynb on my GitHub repository.
Any null values would be replaced with 0, since null values generally indicated that no offers were received/viewed/completed or no expenditure was made.
In order to differentiate between the situation of 1) receiving an offer and not responding to it and 2) not receiving an offer at all, the number of offers received by each individuals were tracked.
The input features for the models would be the newly engineered features and membership start date.
Due to the high degree of sparsity in the features, largely owing to the low completion rates and low expenditures for offers, dimensionality reduction was performed for these input features.
To prevent features with larger values from dominating others, normalization of the features was performed.
Standard scaling was used to reduce these features to a mean of 0 and a standard deviation of 1.
Models3 separate models were created, one model for each of the missing attributes: age, income and gender.
The portion of the profile dataset without missing values would be used to train the models (approximately 87.
2% of the profile data).
Both the age and income models are regression problems, while the gender model is a multi-class classification problem.
K-Fold cross validation with 5 folds was used in the grid search process to optimize the models.
XGBRegressor and XGBClassifier are the models chosen for the regression and classification tasks respectively.
These are tree-based non-linear models that are relatively fast and accurate.
The only major drawback to these models is that they do not innately extract feature interactions.
For example, if our datasets only track total spending for each offer type (e.
$10 for BOGO 5/5/5, $30 for BOGO 7/10/10, etc.
), but not for each family type (e.
$40 for all BOGO offers), the XGBoost model will make modelling decisions based only on spending for each offer type.
It is unable to extract any information with regards to total spending for BOGO class of offers and use this information in the modelling process.
Hence, we will need to manually engineer these feature interactions if we want our models to take advantage of them.
MetricsRoot Mean Squared Error (RMSE) was selected as the optimization metric for the age and income models as I wanted to prioritize predictions’ accuracies.
Note that minimizing RMSE is the same as minimizing Mean Squared Error (MSE), since RMSE is the square root (linear transformation) of MSE.
RMSE is calculated as such:In our case, Pi is the predicted age/gender and Oi is the actual age/gender for each sample i.
On the other hand, the micro-average F1-score was used as the optimization metric for the gender model.
The micro-average F1-score will be calculated in the following manner:whereTP — True Positives.
Number of samples predicted to belong to gender g, and actually belonged to gender g.
FP — False Positives.
Number of samples predicted to belong to gender g, but do not belong to gender g in reality.
TN — True Negatives.
Number of samples predicted to not belong to gender g, and actually do not belong to gender g.
FN — False Negatives.
Number of samples predicted to not belong to gender g, but actually belonged to gender g.
F,M,O represents the ‘female’, ‘male’ and ‘other’ genders respectively.
ResultsThe model for predicting an individual’s age achieved an RMSE of approximately 16.
5, while the model for predicting the individual’s income achieved an RMSE of approximately 13,500.
These numbers meant that on average, the age predictions were off by 16.
5 years, while the income predictions were off by $13,500.
On the other hand, an F1-score of 0.
6 was recorded on the test set for the gender prediction model (1 being the score for the best possible model).
An interesting observation was that the predicted values for gender seem to be dominated by the male gender.
Time permitting, additional investigations could be carried out to investigate why this was the case.
These model metrics were far from ideal.
However, considering the limited amount of information that was available, these results were acceptable.
Nonetheless these imputed numbers should serve us better than the alternative approach of filling the missing values with constant numbers.
Left: Original Distribution of Age, Middle: Distribution of Predicted Age for Missing Data, Right: Final Combined Distribution of AgeLeft: Original Distribution of Income, Middle: Distribution of Predicted Income for Missing Data, Right: Final Combined Distribution of IncomeLeft: Original Distribution of Gender, Middle: Distribution of Predicted Gender for Missing Data, Right: Final Combined Distribution of GenderV.
Data Preprocessing: Generating Monthly DataIn order to transform the datasets into something useful, we will have to perform substantial amount of data cleaning and pre-processing.
At the end of this section, we will generate a dataset that looks like this:Snapshot of Monthly Data After Data PreprocessingThe primary task will be to identify individuals who are likely to spend more money when receiving offers as compared to not receiving offers.
Hence, we need a dataset that reflects how users respond in both promotional and non-promotional situations.
We are also interested if changes in customers’ behaviors can happen over time.
Hence, I have chosen to aggregate customers’ response on a monthly basis.
Since we are not given actual dates, I will estimate a month to be 30 days and treat day 0 as the start of month 0.
From the snapshot above, we know that ‘person id 2’ received a promotion ‘offer id 2’ at month 0.
The person did not spend any money on that offer during the month.
Likewise, ‘person id 2’ did not spend any money on non-promotional situation.
Note that I will be using ‘offer id 10’ to represent non-promotional spending for simplicity.
Every month, the dataset should track:How much customers spent during offers’ validity if they received offers.
This figure could be 0 if they did not spend any money.
How much customers spent during periods of time when there were no offer.
This figure could also be 0 if they did not spend any money.
Checks of randomly selected individual’s transactions suggest that customers did not generally receive more than 1 offer a month.
Hence, customers were exposed to more non-promotional days as compared to promotional days every month.
Aggregating data on a monthly basis was preferred over a daily/weekly basis since most customers received an offer once every few months.
If data was aggregated on a daily/weekly basis, there would be too many days/weeks without exposure to any promotions.
I will now discuss the process of generating the monthly dataset.
It is a lengthy discussion and best followed with the code.
Hence if you wish to skip this section, feel free to continue reading on the section ‘VI.
Exploratory Data Analysis’ near the end of this page.
Define Successful/Tried/Failed OffersBefore we can track how much money customers spent during the validity of promotions, we need to classify the offers according to their possible outcomes.
There are 3 possibilities: successful, tried and failed.
For an offer to be classified as successful, it has to be received, viewed and completed before the offer expired.
This meant that the customer was aware of a promotion and was making transactions as a result.
If the customer completed the offer before viewing it, the offer would not be classified as successful, since the customer was not influenced by the offer when making transactions.
In the event that a customer made some transactions before viewing the offer, but did not spend enough to complete the offer.
If he/she viewed the offer while it was still valid, and spent more money to complete it before it expired, the offer would be classified as successful as well.
Hence, the flow of events for a successful offer is:offer received -> optional: transactions made -> offer viewed -> transactions made -> offer completed -> offer expiredA tried offer is one that a customer viewed, spent some money before the offer expired but did not complete it.
Hence the customer did not spend sufficient money to complete the offer’s requirement.
Since informational offer do not have an offer completion event, they will be treated as a tried offer if the customer viewed the offer and spend some money during its validity.
The flow of events for a tried offer is:offer received -> optional: transactions made -> offer viewed -> transactions made -> offer expired -> optional: offer completedFailed offers will be offers that did not fall into the two previously mentioned categories.
For example, if an offer was received and viewed but no transactions were made before the expiry of the offer, the offer would be a failed offer.
If an offer was received but not viewed before its expiry, it would also be classified as a failed offer, even if money was spent during the offer’s validity.
This is because the customer is spending money without any influence from the offer.
Tracking the amount of money spent during promotions will be equivalent to finding the amount of money spent for successful and tried offers.
Split transcript datasetThe first step of preprocessing involves splitting the original transcript into 4 smaller subsets:transcript_offer_received: tracks when customers received offers and what kind of offers they receivedtranscript_offer_viewed: tracks when customers viewed offers and what kind of offers they viewedtranscript_offer_completed: tracks when customers completed offers and what kind of offers they completedtranscript_trans: tracks all transactions carried out by customersGenerate Labelled Monthly Transaction DataTo begin, we will identify offers that are successful or tried.
We start off by merging transcript_offer_received, transcript_offer_viewed and transcript_offer_completed together.
The resulting data frame will look like this.
At this stage, a lot of offers generated from the merging process will be nonsensical.
We can eliminate a great deal of false offers by keeping those that meet the following conditions:time offer completed > time offer viewed > time offer received(time offer viewed > time offer received) and (time offer completed is null)both time offer viewed and time offer completed are nullFalse offers still exist at this stage, and further processing is needed.
We can calculate the expiry time for all offers by adding the duration of offers to the receival times of offers.
Next, we can classify these offers into their probable outcomes: successful, tried or failed/false.
Note that the classifications at this stage do not necessarily mean that the offers are truly successful or tried.
We will need the transaction information later to find out.
Offers that meet the following condition will be classified as offers that are probably successful:(time offer received ≤ time offer viewed) and (time offer viewed ≤ time offer completed) and (time offer completed ≤ time offer expiry)Offers that meet the following conditions are classified as offers that are probably tried:(time offer received ≤ time offer viewed) and (time offer viewed ≤ time offer expiry) and (time offer expiry < time offer completed)(time offer received ≤ time offer viewed) and (time offer viewed ≤ time offer expiry) and time offer completed is nullThe rest of the offers that did not meet these conditions are either failed or false offers and will be discarded.
Any offers with duplicated values for ‘time_received’, ‘per_id’ and ‘offer_id’ will be dropped with the exception of the first occurrence.
It is unlikely that a person will receive the same kind of offer more than once at a time, and these duplicated entries are erroneous entries generated from the merging process.
We will call the resulting data frame succ_tried_offers.
A snapshot of succ_tried_offers at this stage of processingNext we can merge transcript_trans with succ_tried_offers to get all possible cross products between successful/tried offers and transactions.
We can label each transaction to see if they occurred during the validity of the offer.
In other words, we will check if the spending occurred after the offers were received and before the offers had expired.
Any offers in succ_tried_offers that had transactions occurring during their validity are likely to be offers that are actually successful or tried.
We can then perform a left merge of the labelled transactions back to the original transaction transcript to avoid any potential double counting of transactions.
This will also allow us to identify which transactions occurred during non-promotional times as well.
The labelled transactionsNext we can assign the months in which the transactions occurred and aggregate the data based on the month number, person id and offer id.
This will generate a summary of how much customers spend during promotional and non-promotional times on a monthly basis.
We shall refer to this data frame as monthly_transactions.
A snapshot of monthly_transactionsFind Offers With No Monthly TransactionsFrom succ_tried_offers, we know which offers were successful or tried.
Offers that were successful or triedWe can get the list of all offers that were ever sent by Starbucks from transcript_received.
Snapshot of all offers sentThe difference between the two can tell us which offers had failed.
These are offers that did not induce any monetary transactions during their period of validity.
Next, we can assign the month numbers to when these failed offers were received and name the resulting dataframe monthly_failed_offers.
Snapshot of monthly_failed_offersFind Months When Individuals Did Not Make Non-offer TransactionsNote that monthly_transactions already track how much customers spent during non-promotional times.
Our goal here is to find out which months did customers not spend any money during non-promotional times.
First, we generate all possible combinations of month number, person id and offer id 10 (non-promotional exposure).
Let’s refer to this data frame as non_offer_trans.
We can then merge monthly_transactions data frame to non_offer_trans to find out which month-individual combinations that had no monetary transactions during non-promotional periods.
Thus, we will obtain a monthly account of when customers did not spend money during non-promotional situations.
Let’s call this data frame no_offer_no_trans.
Snapshot of no_offer_no_transAggregating them togetherFinally, we can generate monthly_data by concatenating monthly_transactions, monthly_failed_offers and no_offer_no_trans together.
The resulting dataset tracks on a monthly basis, how much each individual spent on the different promotions sent to them , as well as how much non-promotional spending they made.
Snapshot of monthly data at this stageCompute Profits and Generate LabelsNext, we have to compute the amount of profits generated for each instance of the dataset.
We will first need to compute the number of offers each individual was exposed to every month.
This allows us to compute the cost associated with the offers.
An easy way to do so is to check if individuals were exposed to more than 1 offer of each type in a month.
By inspecting transcript_received, we note the following:No individuals received the same offer type more than once in the same month.
If an individual received an offer that expires during the next month, he/she would not received an offer of similar type during the next month.
For example, if an individual received ‘offer id 2’ at month 16 and the offer expires during month 17.
He/she would not receive another ‘offer id 2’ at month 17.
Hence, we can conclude that customers were only exposed to a maximum of 1 occurrence of an offer type every month.
This means that the cost in monthly_data is simply the reward of the promotion if it was completed.
We can calculate the amount of profits each individual generated for each offer type each month by following the 3 rules:If the offer was successful, the profit would be the monthly revenue minus the cost of the offer.
Note that informational offers have no cost.
If the offer was not successful, the profit would be the revenue generated in that instance.
If the transactions were not made as part of an offer, the profit would be the revenue since there are no cost involved.
The uplift model that we will be using, involves modelling the probabilities of profits for a given person and month in two situations:If the person receives an offer.
If the person did not receive an offer.
Because we want to predict the probability of profits, our labels (has_profit) will simply be an indicator variable indicating if there was a profit for that instance.
Resulting monthly_data obtained at the end of the processVI.
Exploratory Data AnalysisMost Offers were not Attempted by CustomersFrom the chart above, we note that the vast majority of offers were not attempted by customers.
A general observation is that offers with longer validities, higher rewards and lower difficulties tend to have a higher success/try rate.
Note that informational offers cannot be completed since they lack a difficulty.
Hence they are either tried or failed.
Only a Minority of Customers Successfully Completed or Tried More Than One OfferMost customers received between 3 to 6 offers during the period of study.
However, the vast majority of customers did not successfully complete any offers.
Only a small proportion of them completed 1 offer, and an even smaller proportion completed 2 or more offers.
The same is true for offers that are attempted (but not successfully completed).
Very few customers responded (spend money) when presented with offers.
Responding to 2 or more offers was extremely rare.
In general, offers seem to have limited effectiveness.
Monthly Profit TrendThe charts on the left plot the number of individuals who received promotions every month.
In addition, the charts will show how many of these individuals generated promotional profits and non-promotional profits each month.
A profitable instance can also be defined as a data point from the monthly_data dataset with has_profit label of 1.
The charts on the right plot the total monthly profits generated from individuals who received the promotion, as well as the total profits from their promotional and non-promotional spending.
From these charts, we can gauge the effectiveness of the promotions and evaluate whether these promotions’ effectiveness changed with time.
Discount 10/20/5 (Offer ID 0)Discount 7/7/3 (Offer ID 1)Discount 7/10/2 (Offer ID 2)Informational 4/0/0 (Offer ID 3)BOGO 5/10/10 (Offer ID 4)Informational 3/0/0 (Offer ID 5)BOGO 7/5/5 (Offer ID 6)BOGO 7/10/10 (Offer ID 7)BOGO 5/5/5 (Offer ID 8)Discount 10/10/2 (Offer ID 9)A recurring observation was that among individuals who received promotions each month, it was far more likely that they made purchases during days when they were not exposed to any promotions as opposed to days when they were exposed to promotions.
Hence, the number of non-promotional profit instances generally outnumbered the number of promotional profit instances.
Similar observation was noted in the total amount of profits generated during promotional and non-promotional periods, with non-promotional profits frequently exceeding promotional profits.
Granted, the number of days an individual was exposed to an offer was typically lower than the number of days an individual was not exposed to any offer.
Every month, the number of days an individual was subjected to the influence of an offer were bounded by the offer’s validity period, which could be anywhere from 3 to 10 days.
Note that no individual received the same offer more than once in a month.
However, even if we did account for the disparity between exposure periods, non-promotional profits would still exceed promotional profits in most occasions.
Hence, this suggests that promotions have generally limited effectiveness in inducing customers to spend more than they normally do.
If the customers are generally willing to purchase Starbucks’ products without being given any promotions, it will not be in Starbucks’ best interest to send more of these promotions, since doing so will likely erode the profitability of the firm due to the promotions’ cost.
Our hope is that we will be able to identify individuals who spend only when given promotions.
Restricting the sending of promotions to these individuals will help minimize cost and maximize profits.
In part 2 of the series, we will cover feature engineering, implementation of the uplift model, additional adjustments made to the model and data, results, and conclusion of the project.
Link to Part 2 of the article.
The code accompanying this article can be found here.
.. More details