In this post I will walk you through the prototype I built that provides tailored savings recommendations based on a user’s spending habits.
The code for this project can be found on my Github.
Let’s take Equator Coffees as an example because it is somewhere that I spend a lot of money.
Equator has a few locations in San Francisco which are conveniently located near both my work and my home.
For the purposes of this example, we will say I am at work and ready for a coffee break.
If you look up Equator on Yelp, it has a $$ price level, a 4.
5 star rating, and is about a block and a half from my work.
Normally I would head to Equator to grab my afternoon latte, but maybe if there was another coffee shop, similarly close in distance and quality and cheaper in price, I could head there instead and save some money.
This system was built to help me find places like this, where a slight change in habit, could accumulate a lot of money saved over time.
In this example, my system recommended Enough Tea & Coffee which is about 2 blocks away from my work, has a 4.
5 star rating, and is only a $ price tier as shown below.
The Goal:My goal for this project was to use my personal spending habits to create tailored savings recommendations that would save me money with minimum lifestyle impact.
I still want to be able to take that coffee break, but hopefully do it at a cheaper price.
I imagine this recommendation system being used by a bank.
They already have access to all of your credit card transactions which are the most accurate depiction of your spending habits.
A bank providing recommendations for cheaper businesses similar to those you frequently spend money at, can transfer the money saved to a savings account with them.
Building a Recommender:Step 1: Gather + CleanThe first step to building my recommender was to gather San Francisco business data.
I used a subset of the Registered Business Location dataset provided by the City and County of San Francisco for a list of businesses.
From this business list, I made calls to the Google Places and Yelp Fusion APIs to get the business info.
I mainly used the Yelp Fusion API data and filled in null values using the Google Places API.
After combining the business info from both APIs, I still had a handful of null price tier values.
I used KNN imputation with k=5 nearest neighbors to fill in these missing price values.
I also gathered my personal bank statements containing 4 months of credit card transaction history.
These would be fed into my prototype as a basis for the recommendations.
Step 2: Reviews + TopicsNext I did some Natural Language Processing on the business yelp reviews.
Each of the businesses in my dataset had a few reviews which I combined into one paragraph per business.
I first lemmatized the words in the paragraphs and then used Term Frequency- Inverse Document Frequency (TF-IDF) to vectorize.
TF-IDF is a measure to determine how important a word is to a document in a corpus.
In other words, for each word in a business’s paragraph of reviews, I wanted to see how important that word is in explaining that business compared to the other businesses in the dataset.
The word “food” for example, might not be a heavily weighted word using TF-IDF.
While “food” could appear often in the reviews for a certain restaurant, this does not provide much value in differentiating this business when the dataset mainly consists of restaurants.
The bigram “asian_food” on the other hand would have a higher weight since it is a better differentiator of the business from the whole set of businesses.
Comparing two “asian_food” businesses would provide much more value than comparing two “food” businesses that could end up being drastically different like a fine dining restaurant and a cheap cafe.
Once I had my TF-IDF vectors for each of the businesses in my dataset, I used Non-Negative Matrix Factorization (NMF) for topic modeling.
NMF is a method that factorizes a matrix into two matrices that contain no negative elements and it works pretty well on shorter text like reviews and tweets.
I used a scree plot to settle on the number of topics (15) to be generated using NMF.
Finally, I made topic vectors for each of the businesses in my dataset that show the relevance of each of the topics to that business.
Step 3: Compare + MatchI now had a dataframe containing topic vectors and other attributes for thousands of San Francisco businesses.
My next step was to compare these businesses and look for similarities.
I used cosine similarity for comparison, using all of the numerical features for each business.
Cosine similarity looks at the vectors for 2 businesses and calculates the cosine of the angle between these vectors, where similar businesses will have a cosine similarity closer to 1.
This gave me a matrix of cosine similarity scores for each business compared to all other businesses in the dataframe.
To find recommendations for a certain business, I would first take the 25 businesses with the highest cosine similarity and then weight these based on number of matching categories and their haversine distance.
My dataframe consisted of categories for each business taken from yelp such as “coffee”, “cafe”, “restaurant”, etc.
Businesses that have more categories matching the business visited, will be more likely to be recommended.
Similarly, businesses that are located closer to the business visited, are also more likely to be recommended.
Step 4: Refine + RecommendI used a content-based recommendation system for this prototype.
This means the recommendations are strictly based on personal spending habits and the attributes of the business that appeared in the credit card transaction history.
Once I had the list of matches from the previous step, I refined these matches by comparing price levels and rating.
My recommendations would only be for places with an equal or higher yelp rating as well as a lower yelp price tier.
Final Thoughts:To bring everything together, I built a Flask App that reads in my past 4 months of credit card transaction history and provides a summary of spendings as well as recommendations for savings.
It first provides a spendings breakdown of 5 categories: Food, Coffee, Retail, Services, and Bars/Entertainment for each of the months in the uploaded bank statement.
The graph on the right allows for an easy visualization of how the spendings in each category fluctuate from month to month.
The next section in the app shows a breakdown of the businesses in each category and how much was spent at each business over the entire span of the uploaded bank statements.
Finally, the app provides recommendations based on the businesses I have spent money at and shows how much money I could save on a monthly and yearly basis if I switched to these recommended businesses.
I can click on any of the recommendations and be taken directly to the business Yelp page for more info.
You can watch a demo of my app at the link below.
Savings App Demodrive.
comI really enjoyed this project and familiarizing myself with recommendation systems.
Thanks for reading!.