Solving Cold User problem for Recommendation system using Multi-Armed Bandit

The term derives from cars.

When it’s really cold, the engine has problems with starting up, but once it reaches its optimal operating temperature, it will run smoothly.

With recommendation engines, the “cold start” simply means that the circumstances are not yet optimal for the engine to provide the best possible results.

Our aim is to minimize this time to heat the engine.

The two distinct categories of cold start: product cold start and user cold starts.

In this blog, we are concentrating on the user cold start problem.

To solve this problem, we are introducing the concept of using Multi-Armed banditMulti-Armed Bandit (MAB)Multi-Armed Bandit ProblemMulti-armed bandit problem is a classical problem that models an agent (or planner or center) who wants to maximize its total reward by which it simultaneously desires to acquire new knowledge(“exploration”) and optimize his or her decisions based on existing knowledge(“exploitation”).

MAB problem captures the scenario where the gambler is faced with a trade-off between exploration, pulling less explored arms optimistically in search of an arm with better reward, and exploitation, pulling the arm known to be best till the current time instant, in terms of yielding the maximum reward.

MAB for cold usersOur goal is to use different bandit algorithms to explore/ exploit optimal recommendations for the user.

There are several MAB algorithms, each favoring exploitation over exploration to different degrees.

Three of the most popular are Epsilon Greedy, Thompson Sampling, and Upper Confidence Bound 1 (UCB-1):· Epsilon GreedyEpsilon Greedy ApproachEpsilon Greedy, as the name suggests, is the greediest of the three MAB algorithms.

In Epsilon Greedy experiments, the constant ε (valued between 0 and 1) is selected by the user before the experiment starts.

When allocating contacts to different variants of the campaign, a randomly chosen variant is picked ε of the time.

The other 1-ε of the time, the variant with highest known payoff is chosen.

The higher ε is, the more this algorithm favors exploration.

For all our examples, ε is set to 0.


· Upper confidence bound (UCB)Upper confidence bound approachFor each variant of the campaign, we will identify upper confidence bound (UCB) that represents our highest guess at the possible payoff for that variant.

The algorithm will assign contacts to the variant with the highest UCB.

The UCB for each variant is calculated based on both the mean payoff of the variant, as well as the number of contacts allocated to the variant, with the equation:· Thompson SamplingThompson Sampling ApproachThompson Sampling, by contrast, is more principled approach, which can yield more balanced results in marginal cases.

For each variant, we build a probability distribution (most commonly a beta distribution, for computational reasons) of the true success rate, using observed results.

For each new contact, we sample one possible success rate from the beta distribution corresponding to each variant and assign the contact to the variant with the largest sampled success rate.

The more data points we have observed, the more confident we will be about the true success rate, and so as we gather more data, the sampled success rates will be more and more likely to be close to the true rate.

MethodologyInfrastructure for solving MAB cold start problemWe have used Movie Lens dataset for solving the MAB problem where it contains 4 files which were merged and used for collaborative filtering.

Sparse Matrix from User RatingsMatrix after application of Collaborative Filtering and ClusteringMetric for EvaluationWe will be using NDCG (Normalized Discounted Cumulative Gain) which values accurate recommendations in earlier round more than later ones.

Since we focus on recommendations for cold users, we value accurate results in earlier rounds.

whereDCG(u): Discounted cumulative gain of predicted ranking for a target user uDCG∗(u): Ground truthN: Number of users in the result setr (u,1): Rating (according to user feedback) of the item first recommended for user ur (u, t): User feedback for the item recommended in turn tt: Recommendation timeT: Size of the ranked listResultsThe table below provides the NDCG score for the rank-size of 5, 10, 15, 40 and 100.

Thompson consistently performed better than Greedy and UCB.

It took less time to warm up and was able to provide better results quickly.

UCB, on the other hand, performed the worst for different values of N and T.

NDCG Results for varying rank-size and number of usersEmbedding NetworksWhile embedding networks are typically used in a different problem, we wanted to see their effectiveness with the cold-start problem.

Embedding networks are neural networks that learn embeddings between features.

With this knowledge, it should be able to fill in empty values.

So for the Movie Lens dataset, it can embed the relations between users and movies, and can then fill in the missing ratings that a user did not provide for a movie.

While we just performed this task with collaborative filtering, using a neural network allows for the learning of complex, non-linear features.

Embedding networkSo, an embedding network like one shown above can be used to fill in the missing values of an embedding below:Generally, embedding networks are used for recommendation systems with warm users.

This way, the network does not have to guess as much.

With warm users, the learned relations are hopefully well-established and not as subject to change.

With this setting, an embedded neural network can train once on a dense dataset, learn very complex relations, and give highly educated recommendations.

However, our problems do not have the luxury of warm-users that result in a large dataset.

With the cold-start problem, we have to give good recommendations to users that have provided no feedback.

Essentially, their data is empty, and there aren’t any relations that can be inferred for this user.

The naive solution for using an embedding network for this problem could be to first give a random recommendation, and for every following recommendation: take feedback, retrain, sort the predicted ratings, and return the highest projected recommendation.

However, this would result in insane computational costs and delayed recommendations, which could provide disastrous for the user experience.

We had the idea that even though an embedding network has no knowledge of a cold-user, the networks knowledge of datasets relations among users and items could still provide some value.

So, we wanted to analyze performance if, for every user request, we considered multiple recommendations.

These recommendations could come from global recommenders such as random, global average, or most popular.

They could also come from MAB algorithms such as Thompson, Epsilon-greedy, and UCB.

Instead of directly feeding back a single algorithms recommendation, we could first let the embedded network estimate which of these recommendations would score highest.

The highest projected recommendation would then be given to the user.

Depending on computation and timing constraints, the network can be retrained on the user’s feedback.

This idea was tested in a toy example reusing the MAB evaluation framework described in previous sections.

Due to time constraints, the embedded network received hardly any training (5 epochs), and no hyperparameter tuning.

It was also not retrained on user feedback.

For each user request, it considered 3 random recommendations and gave to the user the one that it thought would score highest.

However, it gave better results than UCB algorithm in this small test with 15 trials and 5 random users.

One must note that strictly using random recommendations yields better results from this limited test shown below:While the results are not ground-breaking, it should be noted that further experiments have the opportunity to show much greater performance.

More training, actually tuning the hyperparameters, retraining the network on the user feedback, and using other sampling recommenders (rather than just random) all offer the potential to greatly boost performance.

Conclusion and Future workAlthough cold start users in recommendation system pose a unique problem due to lack of knowledge about the user, MAB has done a pretty good job in recommending movies and is constantly evolving with data inputs.

To sum up, our contributions are:· A formalization of the model selection as a multi-armed bandit problem· Using UCB, Thompson Sampling and Epsilon greedy which is an effective approach to recommend for users without prior side information· An empirical evaluation using NDCG on Movie Lens dataset· Evaluating the effectiveness of Embedding network on a Recommendation systemFor future work, we can try· Having different bandit algorithms as the arms as we see epsilon-greedy performs better at the very start while Thompson performs better at more iterations· Expand this implementation to provide multiple recommendations per iteration — currently, the best movie is recommended over and over· We rely on NDCG score to access our approach, mainly because we were investigating the recommendation quality.

Overall it presents high accuracy levels, but we might check other metrics to ensure a fair evaluation· Tuning the hyperparameters of the Embedding network, retraining the network on the user feedback, and using other sampling recommenders (rather than just random)Thank you very much for reading, and we hope that you learned something from the project!The code for our project can be found in this repo.






















com/heycar/neural-network-embeddings-from-inception-to-simple-35e36cb0c173.. More details

Leave a Reply