Can we infer at a high level where they should or will explore further store expansion?By analyzing store data and how that maps to Census demographics, we can create a high-level “profile” of what Starbucks has deemed to be a valuable location, in addition to seeing where there may be opportunity to further expand footprint in and around these areas.
For full project code in R, click here.
We’ll begin the analysis by grounding ourselves in an overall view of stores in the U.
Note: Each dot represents one Starbucks location and darker regions means more stores.
Data is as of 2017 and will not include more recent openings.
Visually, we can see a bias toward the coasts and major metropolitan areas, all of which make sense given the context of being an upscale mass-market brand seeking consumers that can pay slightly premium prices.
Indeed, this is confirmed by the data where we juxtapose results from the American Community Survey, accessed via tidycensus, at a zip code level and compare the demographics of locations with and without a Starbucks:Demographics of U.
(red) versus Starbucks locations (blue).
The data makes clear that Starbucks targets location with wealthier households (Median Household Income of ~$65K vs.
$50K), more people (Median population of ~31.
~3K), younger age (37.
6 years old vs.
7), and more educated / in the work force.
None of this is surprising if you know the brand, but back-filling intuition with data is an important way to have more conviction at each set of data exploration and strategy.
Now, let’s look deeper into location.
Below is a map to help us understand, at a state-level, the number of Starbucks locations, with the size of the dot representing the ratio of locations to population.
For example, California has the most Starbucks locations (~2,750 as of 2017; darkest red) while Washington has the most locations per person (largest bubble).
What we can take from the above is not only that Starbucks has more locations in proportion with population growth (i.
higher population states have more stores), but also that they overindex in states with major metropolitan areas (i.
these locations have more stores overall and more stores per person).
So, the next step is to look for which locations “look like” Starbucks locations today — using demographic data of Starbucks locations — but don’t yet have a Starbucks.
For the sake of brevity, we’ll only be testing out net new zip codes rather than expansion within current zip codes.
To accomplish this, we’ll use a logistic regression model to predict the likelihood that a zip code has a Starbucks location and look at the highest predicted probabilities of those that don’t.
Next, we’ll use the “typical” characteristics of a Starbucks location and look for the highest zip-code similarity within these factors.
Logistic Regression PredictionA basic logistic regression using demographic variables can correctly predict about 60% of zip codes that have a Starbucks and 90% of those that don’t.
Given the unbalanced nature of the data set — 31K observations and ~5,500 with a Starbucks — a 60% prediction rate should be sufficient for the purposes of this exercise.
Our top result (highest predicted probability without having a Starbucks) is zip code 60629, located in Chicago.
A little investigation shows the following:Left: Zip code area of 60629.
Right: Starbucks locations near this area; Note how they fall just outside the boundaries.
Maybe the area is residential?The zip code has 2 bordering Starbucks locations just outside the zone, and the model predicted that there was a 99% chance of the zip code having a Starbucks.
This should give us confidence that the model is working.
The map below shows the summed probabilities at a county level of zip codes that don’t have a Starbucks today.
For example, if there are 20 zip codes in a county without a Starbucks, this map would show the sum of the predicted probabilities of all these geographic areas.
While this is biased toward counties with more zip codes — and therefore more potential zip codes to sum — I’d argue that doesn’t matter because each county shouldn’t be weighted equally; we’re only interested in those with the biggest opportunity.
Darker green = higher total probability of attractive locations; Rolled up from a zip-code level to county.
The color scale is set to the median probability as a baseline and darker green indicates a probability increasingly above that level.
To me, this shows that there’s opportunity to further expand at a zip-code level in and around current metropolitan locations.
In particular, it appears that Chicago, Southern California, Greater NYC, and Greater Boston still have room for more locations.
Given what we know about their proclivity to overemphasize densely populated areas, we would say that there’s further growth within these areas.
SimilarityNext, we transition to similarity of zip codes.
The approach here is to calculate the median statistics of all Starbucks locations and — after standardizing the variables to offset the effect of different scales (i.
magnitude of population > magnitude of age) — calculate the difference between each Non-Starbucks row and the median values.
Indeed, the findings of this approach are validated when re-constructing the boxplot charts with basic demographic information.
Below, you can see the comparison of the top 100 most similar zip codes of Non-Starbucks locations and how it compares to those of Starbucks locations.
Top 100 Non-Starbucks locations by similarity compared to Starbucks locations.
Note how this compares more favorably compared than the previous chart above.
And, where are these similar locations found?.Let’s look at the top 20% of zip codes by State below:Top 20th percentile of similar Non-Starbucks zip-codesRisks and Next StepsUnconstrained expansion into new attractive locations may look good on paper but there are other factors to consider.
Here are a few next steps that could make sense and some of the questions that we would ask if a client approached us where I work at Stax Inc.
:Alignment to company strategy: What go-forward strategy has management set and which locations align with this strategy?Evaluate competitive activity in the top regions: Which national chains are there and in what quantity?.Who are the regional chains that have a presence?.Who should be considered to be within our competitive set?Understand local market nuances: What is the local nuance of each of these markets?.How does further penetration of Boston, MA (values convenience and speed) look compared to Portland, OR (values local and atmosphere)?.What do we need to believe and do in order to win in each market?Consider market saturation: At what point is a market over-saturated and at risk of cannibalization among the stores?.How many stores is “too many” in a given location?Real estate and labor costs: What real estate is available in each market?.Do the costs inhibit opening a profitable store?.What is the labor availability and cost?Profile the target customer: Within each market, who is our target customer?.We know that the location should mirror the parameters above — High income, younger, dense — but within this group, do we want to reach households between $75K-$90K or $100K-$120K?.Ages 25–34 or 45–60?.Grab and go or people that want space to do work?. More details