Enhancing Starbucks Customer Experience by Building Recommendation Engines — Part 4Photo by Charles Koh on UnsplashThe fourth part of the series focuses on the machine learning part of the project, which are clustering and building recommendation engine. If you have not yet read the introduction part of this project, I strongly suggest you to follow this link and read it to give you a clearer picture of what I’m doing.You can see the code I used to clean the data by following this link here.ClusteringAfter getting a fair grasp on the Starbucks’ customers demographic profile, I did a clustering using KMeans to segment the demographics into clusters.Methodology Before doing the clustering, I used standard scaler to scale the data. This is necessary because there are columns with really big values and those with just 0 and 1. In other words, there are big differences in variance for each feature that were used in clustering. Maintaining the original data would cause more weight on the variables with smaller variances. Thus, the resulting clusters would be separated along variables with greater variances.ImplementationThe method of choosing how many clusters (k) I used was the elbow method. I chose to look at several clusters, between 1 and the number of variables used in clustering, and performs KMeans clustering for each number. Then, I calculated the sum of squared error (SSE) score for each k, and appended the scores into a list. Afterwards, I created a line plot to see the elbow, and chose that as the number of clusters which was used in the further analysis.Fig 1. SSE vs K on KMeans ClusteringThere is no clear elbow in the KMeans result, except for the slightly huge reduction of SSE from cluster 7 to 16. Therefore, I decided to go with 16 clusters to fit and predict which user belongs to which clusters. I created a visualization to plot the number of users for each cluster as follows:Fig 2. The Number of Users per ClusterAs it can be seen from Fig 2, the distribution of the number of users per cluster is quite varied. For example, there are more than 4000 users who belong to cluster 2, and there are less than 10 users who belong to cluster 8. This might affect the quality of the recommendation engine since most of the users will be mapped into cluster 2, 3, 7, or 16.Next, I grouped the user demographic data frame based on the cluster. I ended up with the cluster data frame. Here is a snapshot of the first few lines and columns of the data frame. Please check my github if you would like to see the full data frame.Fig 3. The Cluster Data FrameDescription of the First 3 Clusters1. The first cluster is populated with Male customers with the age around 49 and the income around USD 58052. They tend to favor bogo4 over other types of offer. After receiving an offer, they do about 2–3 transactions. They also do moderate transaction not from offer received, indicating that they do like to have their coffee at Starbucks.2. The second cluster is also populated with Male customers with the age around 49 and the income around USD 56291. They tend to favor discount3 over other types of offer. After receiving an offer, they only do about 1 transaction. However, they do purchase in Starbucks although they didn’t receive any offer. I think, these customers are not really loyal. They buy coffee whenever they feel like it, and they prefer discount over anything else.3. The third cluster is populated with Male customers with the age around 60 and income around USD 78141. They tend to favor bogo1 over other types of offer. After receiving an offer, they do about 2–3 transactions and they also buy coffee although they didn’t receive offers. This cluster is similar to the first cluster, only they have higher income.Cluster-based Recommendation EngineAfter segmenting the customers, I created the first recommendation engine, which was the cluster-based recommendation engine. Essentially, the engine will depend on the clusters which were created before and created a recommendation based on what the users in a cluster in which a user who will be given recommendations belongs to like.MethodologyPrior to creating the recommendation engine, I had to consider several things which are:1. Which users don’t like to be given offers. 2. Create cluster-promotion matrix.3. Check which cluster a user belongs to.4. Recommend top 3 offers which the cluster this user belongs to like, and get the list of these offer ids.5. If the user belongs to a cluster of users which doesn’t like to be given promotion, I wouldn’t give any recommendation.6. If the user is new, I would give the top 3 most favorite offers from all the users.ImplementationI created a data frame containing user id and transaction not from offer ratio to find out which users don’t like to be given offers. If the ratio is greater than the mean value of all the ratios, I would consider the user as a user who don’t like offers or promotions.Fig 4. The ‘Offer or Not’ Data FrameAfterwards, I created a data frame of cluster-promotion matrix. The variables I used were only the discounts and buy one get one variables. I decided not to include the informational variables because informational offers should be given to every user anyway. The first few lines of the resulting data frame is as follows:Fig 5. Cluster-promotion MatrixI used this data frame and the ‘offer-or-not’ data frame to create the recommendation engine. I created a function which does the following:1. Take an input of user id2. If the user id is in the ‘offer-or-not’ data frame, do not recommend anything.3. If the user id is not in the ‘offer-or-not’ data frame, check which cluster this user belongs to.4. Using the cluster-promotion matrix, recommend top 3 offers which the user in the clusters like the most.5. Convert the offer alias into the original offer id.Fig 6. Recommendation ResultThe recommendation engine managed to recommend top 3 offers which are liked by the users in the same cluster as user 293. As for the user 20, since this user belongs to the ‘offer-or-not’ data frame, the engine did not recommend anything. The new users will be given the 3 offers as stated in Fig 6.RefinementAll in all, this recommendation engine did what I wanted to do. However, I did have an issue with it. As it was stated above, the cluster contains different number of users. Therefore, most users were placed in only 4 clusters. This might cause the recommendation engine to give less personalized offers. Therefore, I decided to create a user-based recommendation engine.User-based Recommendation EngineThe user-based recommendation engine I had in my mind is more personalized. Basically, the engine rely on finding similar users on the user-promotion matrix and give recommendation based on what offers do these similar users like the most.MethodologyThe methodology in creating the user-based recommendation engine is quite similar to the cluster-based recommendation engine. I had to consider several things which are:1. Which users don’t like to be given offers. 2. Create user-promotion matrix.3. Check top 10 users who are most similar to the user id to be given recommendation.4. Recommend top 3 offers which the 10 most similar users like the most.5. If the user belongs to a list of users which don’t like to be given promotion, I wouldn’t give any recommendation.6. If the user is new, I would give the top 3 most favorite offers from all the users.ImplementationI still used the ‘offer-or-not’ data frame to find out which users don’t like to be given offers. Therefore, I didn’t create a new data frame for this. First thing I did was to create the user-promotion matrix. I used the user demographic data frame with the user id as the index and the discounts and buy one get one offers as variables.Fig 7. User-promotion MatrixI used this to calculate the similarities between each user. The method I used was calculating the dot product between the transpose matrix of the corresponding row of the user id to be given recommendation and the transpose of the user matrix data frame. The most similar users are obtained by performing argsort of the negative values of the user similarity list + 1. The last thing to do was to remove the user id from the list.To create the recommendation engine, I created a function which does the following:1. Take an input of user id2. If the user id is in the ‘offer-or-not’ data frame, do not recommend anything.3. If the user id is not in the ‘offer-or-not’ data frame, find 10 most similar users to this user id.4. Using the user-promotion matrix, recommend top 3 offers which the 10 most similar users like the most.5. Convert the offer alias into the original offer id.Fig 8. User-based Recommendation ResultCompared to the previous recommendation engine, the user-based recommendation engine created slightly different recommendations for user id 293. 2 of the recommendations are the same. For user id 3, the engine did not recommend anything since this user belongs to the ‘offer-or-not’ data frame. As for the new user, the algorithm produced the same result as it is the same algorithm.Evaluation and ValidationBoth of the recommendation engines have successfully achieved what I want:It recognizes which users should not be given offersIt recommends offers based on similar users / clusters’ preferenceIt recommends most popular offers for new users.Evaluating the recommendation engines is not an easy task. Starbucks has to spend certain amount of time to do A/B testing for these system. The metrics have already been mentioned in the introduction part of this article (Part 1). However, there is a slight change on the number of group. Starbucks should have 3 groups, 1 control group and 2 experimental groups to test these recommendation engines.Justification Personally, I think that the user-based recommendation engine is better than the cluster-based one. It is more personalized and versatile as we can determine how many similar users we’d like to use. Also, as I have mentioned above, the number of users in each cluster is considerably different. The maximum number is above 4000, and the minimum number is only 8. This will affect the cluster-based recommendation engine since the more users are in a cluster, the higher the variance of preference.ConclusionReflectionThe data sets which is given is really tough, in the sense that there are just basic features in the dataset. Most of the time, businesses in hospitality industry tend to rely on ratios as metrics, so I have to create those ratios data with what these data sets. However, that wasn’t really the toughest part.The toughest part of this project was to clean the data, especially imputing age, gender, and income. I can impute these variables using statistics such as mean for age, mode for gender, and mean / median for income. However, the result will be less accurate. When I decided to create a recommendation engine for this project, I realized that getting the most accurate data is vital, especially for the clustering part. Therefore, I decided to impute the missing / wrong values using machine learning. It was not easy for me, especially because I have to determine the order of imputation. I decided to go with the one with higher R2 score first.After imputing the data, the process towards creating recommendation engine is not that difficult. However, the problem with recommendation engine is that it should be tested in real life to find out whether it’s working great or not. I will need to study further to deepen my knowledge on machine learning and recommendation engine to make sure that I can build great performing models.ImprovementSeveral improvements can be made to build better recommendation engine. Some of those are:Build a better random forest model to predict age, gender, and income. The R2 score is not high enough, and the MSE score is also not low enough. I can tune the hyperparameter more if I have more time, or perhaps try to use other algorithms such as Ada Boost or perhaps even neural networks.Build a better recommendation engine by classifying promotion offers into star, workhorse, puzzle, and dog. I will use the menu engineering concept and implement it to classify promotion. The star promotion will be those which have low cost and high amount of transaction generated. The workhorse will be those which have high cost and high amount of transaction generated. The puzzle will be those which have low cost but low amount of transaction generated. The dog will be those which have high cost and low amount of transaction generated. By doing this, we can always promote either star or puzzle promotion and tune the workhorse promotion as well as getting rid of the dog promotion. Of course, I will need additional data to do that.If you have suggestions or critics, please do not hesitate to comment. I am still and will always be a student in this field. If you have any questions or want my opinion on your project, do not hesitate to contact me.