How can Data Science help a bakery?

What’s is our growth opportunity?Business PerformanceSince we only know the number of transaction in this dataset but not the revenue of the dataset, we could only look at how many times each products were purchased during the six months.

Top 12 Selling Products in BakeryOverall, we could identify that coffee was sold most over the past six months.


4k cups of coffee were purchased, which account for 26.

68% of the total transaction.

Bread comes the second place (3.

3k, 16.

21%) and Tea was the third mostly purchased.


4k, 7%) The rest of the product transaction constructed a long tail.

Top 6 products in bakeryWhen we look at the sales of our core products by month, we find that most of them stay relatively uniform, most of them experienced a little bit drop between December and January and then the number climbed up little bit again.

It could be that this is a festive period, or that there are less foot traffic due to weather.

We need more information to determine the cause.

However, when we look at our second tier of core products (top 7–12), we find that some of the products performance is quite different.

For example, Farm House (I assume is a bread type) and Medialuna (croissant) experienced an executive five-month decline.

It is very hard to explain the decline without more information and context, we can consult with the bakery and ask more questions regarding recipes change or promotions that ended, etc, to determine the cause.

Traffic InsightSecondly, I calculated the average sales each hour by each top product category to identify if there’s products that customers tend to on a specific time, so that we could set up specific in-store campaign to boost the sales, or create a loyalty scheme for customer that purchase certain items at those times.

Hourly transactions by productBased on our visualisation, we find that:Bread and Coffee seem to share similar pattern.

Both products gained a peak around 10 am (Before work) and also a small peak around 2 pm (After Lunch Break).

Pastry shown one peak at 10 am as well.

What’s more, there’s a small peak around 5 pm as well (after work before dinner).

Sandwich seems to a choice of lunch for many people.

It reached its peak around 1pm and then fell down gradually.

Tea was chosen more often in the afternoon after lunch.

Hourly transactions by productAs we look through our second-tier products (7–12) , we find that morning peak (around 10 pm) and afternoon peak (2–3 pm) were quite prevalent across different product.

Besides that, there are also some interesting points worth pointing out:Hot chocolate demonstrated on peak around 6pm, which was quite different from other products.

One hypothesis could be people would like to drink something to warm it up but don’t want drink with caffeine.

Market Basket AnalysisWe analysed how many items were sold for each product and when they were sold.

Finally we came to answer the question, which item is more likely to be purchased together with another item.

I won’t cover specific methodology of market basket analysis.

For those who are interested in this techniques, feel free to one Kaggle kernel created by Xavier.

Basically, there are three metrics evaluating the market basket:Support: how frequently the item set appears in the data set.

Confidence: the percentage in which Y is bought with X.

Lift: how much X, Y are bought together more likely than X, Y are independent with each otherMarket basket analysis against coffeeAfter calculating these three metrics of all different combination, we selected top 10 item sets by the order of lift, controlling the minimum threshold of support and confidence.

We find that of all association rules, items are connected with coffee.

The redder the circle is, the more likely these two items are purchases together, which indicates that toast and coffee are most likely to be bought together (lift = 1.


The bigger the circle is, indicating that the set happened more frequently.

Here, cake and coffee were most frequently bought together.

(support = 0.

054)Photo by Tyler Nix on UnsplashCoffee located in the centre of the association network is quite as what we expected since it takes 26.

7% of the transaction.

But besides that, If we exclude coffee in our analysis, will we find any interesting co-consumption pattern between other two product?.Even though right now there’re not many transaction for them, but we could turn it into a growth opportunity?After I exclude all of the coffee records, the association rules network looks more diversified even though we have to lower its level of support.

And I find some interesting connection that may be worth further researching:Salad + Extra Salami or Feta: People usually would like to personalise their salad recipe.

By adding extra add-on, we could potentially increase the average price of salad.

Cookies + Alfajores + Juice: People who eat cookies/ alfajores, besides choosing coffee, would choose juice as their second choice.

Coke + Juice + Sandwiches: People who eat sandwiches would often choose juice or coke.

Here we see a strong connection between food and beverages, a promotion can be made as meal deal.

Final ThoughtsThere are obviously more topics we could explore on this datasets.

And if we see this bakery business from a higher level, I think there are more information that needs to be included:Price: price of each SKU, by joining the transaction data and price data, we could analyse the revenue of our product, and think about increasing our premium products.

Cost: from an operational and financial perspective, we also need to analyse the inventory and profitability of our products list.

Probably there are some products we could cut down or include some new SKU.

Customer: to understand who are our target and most profitable customers, it would be ideal for us to know who bought what kind of products.

Stock: to understand if there is any product that repeatedly runs out of stock and try to predict when it will before it does.

Categories: Hot, cold, salty, sweet, etc, categories and segments can help understand trends.

This is just a simple exercise of what can be done with very simple data, and what can be achieved with this type of data.

Next step could be to create a stream of data and dashboard that would show the entire insights of the transactions in real time.

To see original post with script please see: Uni Data.

#unidata #unifyingdata #drivechange #datascience.

. More details

Leave a Reply