Network models for recommender systemsRoxana PamfilBlockedUnblockFollowFollowingMar 27Recommender systems have become ubiquitous in the Internet age; online users are presented with tailored choices not just for physical products, but also for movies, songs, news articles, hotels, and more.
Many of these systems involve the principle of collaborative filtering, in which users are recommended items based on the preferences of other similar users.
Popular techniques behind recommender systems include matrix factorisation, neighbourhood methods, and various hybrid approaches.
An intuitive way to model interactions between users and items is by using a bipartite network.
In the toy example below, customers are connected to previously-purchased grocery products.
By performing edge prediction in such a network, we can address two important business problems:1.
[Recommendations] What new items should we recommend to a given customer?2.
[Targeting] Which users should we contact in a promotional campaign for a specific product?(Left) Diagram of a customer — product network.
(Right) A larger bipartite network.
Networks and community structureA network is an abstract representation of a system in which objects called nodes interact with each other via edges, typically in a pairwise fashion.
They arise in many domains and can be used to solve practical problems such as detecting bots on Twitter, finding vulnerabilities in electrical grids, and discovering new drugs by predicting their effect on protein function.
A common feature of many networks is their modular structure, meaning that nodes can be clustered into tightly-knit groups called communities.
Examples of communities in empirical networks include left-leaning and right-leaning political blogs in the US, scientists who frequently co-author papers, and groups of friends in social networks.
In a shopping network, communities reveal groups of customers with similar preferences and the products they buy most.
Targeting/recommendations approachIntuitively, the products in a customer’s community that they haven’t yet purchased are potential recommendations.
Similarly, in a promotional campaign for a given product, the best customers to target are those in that product’s community.
Let’s focus on the second scenario and describe the process in more detail.
Key steps in a network-science approach to targeting.
Constructing a network from transactional data requires defining a set of unique customers, a set of products, and a time period over which to aggregate purchases.
An edge between a customer and a product indicates that the corresponding purchase occurred during the specified time period.
Because some of these purchases occur in higher volumes than others, it is useful to add weights to the edges.
Choices of weights include item count (how many bananas did this customer buy?) and a normalised variant called item penetration (what proportion of the items bought by this customer are bananas?).
Constructing a weighted network from transactional data.
Customer — product networks constructed in this way can be very large; we have analysed some that have millions of nodes and hundreds of millions of edges.
However, these networks are also very sparse: only a tiny fraction of the potential connections appear as edges, because a typical customer purchases no more than a few dozen distinct products over several months.
The next step in the process is to use a community-detection algorithm to cluster the customers and products into cohesive groups.
There are many ways to do this, but we settled on a popular method called modularity maximisation.
This method ascribes a quantity, ????, called the modularity to any partition of a network into communities.
Community detection then becomes the task of optimising this function ????.
This is an NP-hard problem, so researchers have developed various heuristics, such as the Louvain algorithm, to find a good solution that balances accuracy and speed.
(And speed is very important when your network has half a billion edges!)Community detection as an optimisation problem.
Stochastic block model.
We now have a network in which each customer and product belongs to a community or block.
The next step is to estimate a set of edge-propensity parameters θrs that describe how nodes in different blocks connect with each other.
More precisely, in our customer–product networks, the observed probability that a customer in block r connects to a product in block s is proportional to θrs.
Because communities have large internal edge densities (by construction), the largest of these probabilities correspond to r = s.
Edge-propensity parameters for a network with three communities.
It is possible to detect communities and compute the parameters θrs simultaneously using statistical inference (rather than in two separate steps), but the resulting algorithms are significantly slower.
We now have a statistical network model of our data that makes it possible to calculate, for any customer and product, the probability of an edge existing between them.
For a customer c in community r and a product p in community s, this probability isThis expression depends on three important factors:1.
The community-specific purchase probability θrs, i.
the probability of a customer in community r purchasing a product in community s.
The degree of customer node c, which is equal to the number of edges from that node.
Thus, the model assigns larger probabilities to customers who buy more distinct products.
The degree of product node p.
Thus, the model assigns larger probabilities to products that are more popular (i.
, that are bought by a larger number of customers).
The scaling factor is a constant that ensures that the final result is a well-defined probability.
Calculating purchase probabilities.
Lastly, sorting the purchase probabilities from the previous step from largest to smallest produces a customer ranking that reflects their affinity to the target product.
The top customers can then be sent coupons to elicit an initial purchase and introduce new customers to the brand.
Validation: retrospective analysis of a promotional campaignTo test how this method compares to simpler targeting approaches, we analysed data from a campaign in which a promotional offer for yoghurts was mailed to 100,000 customers.
These customers were selected based on previous engagement with the products on promotion and possible engagement with competitor products.
Using our network model, we rank the 100,000 customers according to their affinities to the promotional yoghurts.
If higher-ranked customers redeem more coupons than lower-ranked customers, then the corresponding ranking has predictive power, which suggests that the same approach can be useful to identify relevant customers for future campaigns.
We use a ranking based on category spend as a baseline for comparison.
We thus have two ways of ranking customers based on their affinity to the promotional yoghurts, and we compare these rankings using gains charts.
In a gains chart, one plots the percentage of positive responses (in our case, the percentage of coupons redeemed) as a function of the population size.
A ranking is predictive if there are more positive responses among the higher-ranked customers, and this corresponds to a curve that lies above the diagonal line in the chart.
Comparison of two ways of ranking customers that were contacted as part of a promotional campaign on yoghurts.
The results show that the network model outperforms category spend and is generally better at identifying customers who are likely to redeem the coupon.
It is particularly interesting that the performance gap is more pronounced among low-spend customers, who are less engaged with the brand.
This suggests that we can capture a customer’s affinity for a product based on their similarity to customers within their community, rather than just their historical purchases.
More testing is required, but the initial results are encouraging.
ConclusionsBipartite networks are a natural representation of purchasing data.
An important task is the prediction of new edges, which can feed into recommender systems and targeted promotional campaigns.
In terms of computational performance, community detection is the most time-consuming part of the process that we described.
That said, networks with billions of edges are currently within reach, and the bottleneck is often the amount of memory available rather than the computational budget.
A key advantage of the network approach is that it is unsupervised.
It does not require generating user or item features (which can be time-consuming), and it does not need labelled data for training.
Another advantage is that networks work well in sparse settings.
For instance, if a customer has only bought one or two products, supervised methods might struggle for lack of sufficient training data.
In contrast, assigning such a customer to a community and computing the corresponding edge probabilities poses no issues.
This points to opportunities to use community-based recommendations beyond grocery, for slow-moving goods.
Finally, network models can be extended relatively easily to more complicated scenarios that include, for example, temporal information, node metadata, or information about product hierarchies.
Further resourcesFor more details about community detection and stochastic block models, see this user guide.
If you would like to try out this approach for yourself, you can download The Complete Journey, a data set that provides anonymised customer-level shopping data.
Roxana Pamfil worked in collaboration with dunnhumby for her PhD at the University of Oxford.
She is currently implementing some of her research findings within dunnhumby as part of a knowledge transfer secondment.
.. More details