Trail Secrets: An Intelligent Recommendation Engine for Finding Better HikesPerry JohnsonBlockedUnblockFollowFollowingJun 29I recently went on a weekend camping trip in The Enchantments, which is just over a two hour drive from where I live in Seattle, WA.
To plan for the trip, we relied on AllTrails, which is a fantastic application with over 75,000 hand-curated hiking trails along with photos, reviews, and in-depth trail information.
AllTrails has a community of more than 10 million hikers that leave trail reviews and provide up-to-date information on trail conditions.
AllTrails is really the only resource that can help folks plan for multi-day camping trips, figure out where to go for a hike with parents or make sure to correctly traverse Aasgard Pass, a sketchy 2300 feet elevation gain in less than a mile (as pictured below).
They closed a huge $75M round of funding in 2018 to take their product to the next level.
Aasgard PassThe one thing I really wish AllTrails had was personalized hiking suggestions based on hikes I’ve enjoyed in the past (i.
if I liked hike X, I’ll like hike Y because it has similar hiking trail features) or based on how I’ve rated hikes in the past (i.
Goodreads personalized book recommendations based on your reviews).
AllTrails gives users the ability to search for hikes via custom filtering of specific hiking trail attributes (distance, elevation gain, difficulty, etc.
) but it lacks any intelligent recommendation algorithms that would lower the search cost to find hikes that users really like.
I set out to ask the following questions:What types of intelligent recommendations would be useful for an AllTrails user/hiker?Can I build a personalized recommendation engine that leverages AllTrails user hiking reviews?Can I create a Power Ratings that blends together total number of reviews and average rating for a given hike?To answer these questions, I built a full-stack machine learning web application, Trail Secrets, which provides users with intelligent hiking recommendations based on AllTrails user reviews and hiking trail attributes.
This application can help power AllTrails web, Android and iOS applications as the first set of intelligent algorithms that would personalize the AllTrails user experience to find better trails.
Data Collection and Machine Learning PipelineThe full data pipeline for building “Trail Secrets”DataI wrote a Python script that web scraped AllTrails hikes and user review data for all of Washington state.
This left me with data for ~2,700 hikes and ~110,000 user reviews which I stored in a MongoDB database that I hooked up locally.
An example of what can be scraped from a Hiking Trail Profile: Difficulty, Number of Reviews, Rating, Overview Paragraph, Distance, Elevation Gain, Route Type, and all of the additional hiking attributes that have been tagged.
Hiking Trail FeaturesAllTrails hand curates all of their listed hiking trails which means they extensively tag all of the hikes with in-depth descriptive data (as shown in the Lake Serene Trail Profile above).
Once I scraped all of Washington state’s hiking data, I was armed with plenty of descriptive data to create features that would fuel these intelligent recommender systems.
These are the features I used to build a similar hike algorithm, and a personalization algorithm based on user ratings.
Full Feature SetThe numerical features are total distance (in miles), elevation gain (in feet), and the elevation severity.
The remaining features are categorical tagged with a value of 0 for “No” or 1 for “Yes” depending on if the feature describes a given hike.
Most of these features I was able to create directly from cleaning the raw data stored in MongoDB.
I engineered a few additional features which included:Elevation Severity: Elevation in feet gained per mile of hikeFoot Traffic: I parsed the Trail Overview paragraph for language that described the typical foot traffic for a given trail.
These were categorized as Heavy, Moderate, Light and UnknownRecommender SystemsTo build the machine learning models, I leveraged Apple’s open-source machine learning library, Turi Create, as it’s incredibly flexible to develop custom recommendation models.
Item Content SimilarityThis recommender only takes into account hiking trail attributes.
It looks at each distinct pair of hikes and calculates how similar they are.
This similarity score is calculated by first computing the similarity between each feature and then takes a weighted average of those to get the final similarity.
This is useful because one can specify hikes they know they like and this recommender will provide hikes that are most similar to that.
Example: “If you like the Mount Si Trail, here are hikes that have the most similar hiking attributes to the Mount Si Trail.
”Model Efficacy HeuristicPrior to building the model, I wanted to have a couple specific examples where I could test the algorithm for the quality of its recommendations.
These are a couple examples based on my own experience:Pike Place Market → should probably have other urban-like, short distance, low elevation gain walksLake Serene Trail → should probably have Colchuck Lake as they are both high foot traffic, challenging hikes of similar distance with an alpine lakeBoth of these test cases check out as they should with Pike Place Market returning other short urban walks and Lake Serene Trail returning Colchuck Lake as its second most similar hike.
Feel free to search for similar hikes and check for yourself!Ranking FactorizationWe have the actual hike ratings given by AllTrails users so choosing the optimal model depends on whether we want to predict the rating a user would give for any particular hike, or if we want the model to recommend hikes that it believes the user would rate highly.
We care about ranking performance, as in we want to recommend hikes that users would likely rate highly.
The RankingFactorizationRecommender recommends items that are both similar to the hikes in a user’s dataset and those that would be rated highly by the user.
The intuition behind this recommender is that there should be some latent features that determine how a user rates an item.
Example: “For Perry, an AllTrails user that has rated some hikes, here are the hikes that Perry would likely rate very highly.
”Building This ModelUsers explicitly rate hikes with number of stars (1=strong dislike, 5=strong like), and we have ~110,000 of these ratings (records saying that user A rated hike X with Y stars) from the past.
I used a technique called split validation: where we take only a subset (80%) of these ratings (called the training set) to train the model, and then we ask the model to predict the ratings on the 20% we’ve hidden (the test set).
For example, it may happen that a test user rated some hike with 4 stars, but your model predicts 3.
5, hence it has an error of 0.
5 on that rating.
Then we just compute the average of the errors from the whole test set using the root mean squared error (RMSE ) formula to get a final result.
That’s how to quantify the performance of this recommender system.
I iterated through a few hyper-parameter values for this model to minimize the RMSE on training set data before implementing it in the application.
PopularityPopularity based recommenders are not intelligent, but they are a useful data product and a potential solution for the cold start problem if an AllTrails user hasn’t hiked in Washington before (or if someone has never hiked before!).
These are generally fun and a useful baseline when searching for hikes to go on.
Number of ReviewsThis recommends the most popular hikes based on number of reviews.
Average Stars (Minimum 100 Reviews to Qualify)This recommends the most popular hikes based on the ratings.
By analyzing the distribution of number of reviews, it’s clear that perfectly rated 5-star hikes are dominated by a low number of reviews.
I determined that a hike needed to have a minimum 100 reviews to count towards this recommender.
For example, I wanted to ensure that a hike with ~6 reviews that had a 5-star score was not included in this.
5 Star Ratings dominated by Low Number of Reviewed HikesPower RatingsI created a custom Power Ratings score ranging from zero to 100 that blended the number of reviews and average rating into the same score.
For example, a hike rated as 4.
9 stars with only 10 reviews should probably not be rated as highly as a hike rated as 4.
6 stars with 1000 reviews.
Distribution of Hike Ratings before blending in the Number of ReviewsStep One: The FormulaPower Rating = (Number of Reviews / (Number of Reviews + Number of Reviews in 90% quantile) * Rating) + ((Number of Reviews in 90% quantile / (Number of Reviews in 90% quantile + Number of Reviews) * Average Rating Across All Hikes)Distribution of Hike Ratings after applying the formula to blend number of reviews and ratings scoreStep Two: MinMaxScalerThe MinMaxScaler scales and transforms data such that it is in a range between zero and one based on a formula using the minimum and maximum value of the specified data.
I then multiplied each value by 100 to scale the Power Ratings Score to a range between zero and 100.
Power Ratings Distribution once scaled to values between 0 and 100Trail Secrets ApplicationOnce I had working machine learning models for hiking recommendations, I started to build out the application using the Flask web framework written in Python.
Trail Secret Web AppFind a Hike Similar to One You’ve LikedTo get similar hikes to ones you’ve enjoyed, a user enters a hike that they enjoyed and then a list of the fifteen most similar hikes with trail information along with an embedded link back to the respective AllTrail profile is provided.
Find Personalized Hikes Based on Your AllTrails RatingsIdeally, we’d have AllTrails user login and profile integration but given that wasn’t possible, I created a unique User ID that maps to an AllTrails user’s full name.
To get personalized reviews, a user submits the full name associated with their AllTrails account, in my case: Perry Johnson.
Then a list of the fifteen most likely hikes that I will really like based on my reviews will be provided along with trail information and an embedded link back to the respective AllTrails hike profile.
(Note: If you use a Chrome browser, you’ll get autofill on the Washington state hikes)ConclusionTrail Secrets provides users with intelligent recommendations for hiking based on AllTrails user reviews and hiking trail attributes.
This application can help power AllTrails web, Android and iOS applications as the first set of algorithms that would personalize the AllTrails user experience to find better hiking trails.
Check out the Trail Secrets application to find better and more personalized hikes.
CodeThe code for this project can be found on my GitHubComments or Questions?.Please email me at: perryrjohnson7@gmail.
comYou can check out some of my other work:Reverse Engineering the Walk Score AlgorithmHow Machine Learning Can Help You Charge Your E-Scooters.