We would expect to be off by approximately 1900 riders per day by taking this easy approach.
A linear model with sklearn performs slightly better in RMSE, and is quite easy to implement.
The model is a series of weights for each variable in our data, in addition to an intercept.
How do we interpret the confidence our model has in each of those individual parameters?This can be where the Bayesian magic really shines.
It undoubtedly takes more lines of code, more thought, and longer training time than the sklearn example above.
I promise you, dear reader, that it can all be worth it.
First, PyMC3 runs on Theano under the hood.
We have to make some slight changes to our pandas/numpy data, and the most major change is by setting a shared tensor, as follows.
When we look to make predictions for our model, we can swap out X_train for X_test and use the same variable name.
Now that we have our data set up, we need to build our model, which we initialize by calling pm.
Inside of this model context, we need to build our complete set of assumption about our priors (parameters) and output.
Normal (Gaussian) distributions are fairly safe bets for your first model.
This constitutes our model specification.
Now we have to learn what the posterior distribution of our model weights could be.
Unlike sklearn, the coefficients are now a distribution of values, not a single point.
We sample a range of possible weights, and the coefficients that appear to fit our data well are retained in something called a trace.
The sampling functions (NUTS, Metropolis, et al.
) are well beyond the scope of this post, but there are vast repositories of knowledge describing them.
Here we build our trace from our model:The NUTS sampler complains that using find_MAP() is not a good idea, but frequently this is used in tutorials, and did not seem to hurt my performance.
We can also try a different sampler that tries to approximate the posterior distributions:The trace can be plotted, and generally looks like this.
The beta parameters look fairly constrained in distribution (left plots) and seem to be reasonably consistent across the last 1000 sampled items in our trace (right plot).
The alpha parameter looks less certain.
Now that we have our posterior samples, we can make some predictions.
We generally observe a so-called ‘burn in’ period in PyMC3 where we discard the first thousand samples of our trace (trace[1000:]), as these values may not have converged.
We than draw 1000 sample weights from this trace, calculate what the predictions might be, and take the mean of that value as our most probable prediction for that data point.
From here, we simply calculate the RMSE.
If we want to test on our holdout dataset :So this model that we built performs better than our naive approach (average ridership) but slightly worse than our sklearn model.
In an included example in the Github repo, I was able to build a similar model that beat the sklearn model by scaling the Y value, and modeling it as a Normally distributed variable.
Further tuning of the model parameters, using different scalings, assuming a wider range of possible beta parameters can all be employed to lower the RMSE of this example.
The goal of this post is to introduce the basics of model building and provide an editable example that you can play around with and learn from!.I encourage you to provide feedback in the comments section below.
To recap, there is a price to pay for Bayesian models.
It certainly takes longer to implement and write a model.
It requires some background knowledge on Bayesian statistics.
The training time is orders of magnitude longer than using sklearn.
However, tools like PyMC3 can offer greater control, understanding, and appreciation for your data and the model artifacts.
Although there are a number of good tutorials in PyMC3 (including its documentation page) the best resource I found was a video by Nicole Carlson.
It explores how a sklearn-familiar data scientist would build a PyMC3 model.
Careful readers will find numerous examples that I adopted from that video.
I also learned a lot from Probabilistic Programming and Bayesian Methods for Hackers, which is a free notebook based tutorial on practical Bayesian models using PyMC3.
These two resources are absolutely amazing.
Duke also has an example website that has numerous data situations that I found informative.
Towards Data Science has also hosted a number of cool posts throughout the year that focused on Bayesian analysis and have helped inspire this post.
In the next blog post, I will illustrate how to build a Hierarchical Linear Model (HLM) that will greatly improve the performance of our initial approach.
Below are a Kaggle kernel that you can fork and a Github repo that you can clone to play around with the data and develop your own PyMC3 models with.
Thank you for reading!DayByDayPredictions | KaggleEdit descriptionwww.