This is the first part of a series that will document my stumblings through getting my models deployed.
Hopefully, I can save you some heartbreak along the way.
So, first things first.
You built a model in a notebook.
Unless you want to run the notebook each time in production, you need to get that modeling object off the notebook so it can be run in a *.
Generally, there are two approaches to what we call model persistence.
The pickle approach and the more data-oriented Joblib method.
The Pickle method or “pickling” as it’s often called is the conversion of an object into a byte-stream for saving and loading.
Since most everything in programming is represented as an object at its core, you can see how this is useful.
However, the Joblib library was developed to be more suited for python objects (usually of the numpy variety).
This results in appreciable time-savings in saving and loading if the object is heavily numpy-array based.
You can see the value of Joblib in data science, and infer from that how it’s probably the better path to model persistence.
You can do both.
But in general, larger datasets will benefit from using Joblib.
There are conversations of pickle being updated to accommodate numpy-based objects better, but that’s for later.
No worries for today though, I’ll show you both methods since they are both relatively simple.
First, let me build a quick model from the beloved iris dataset.
Let’s score it.
Not the greatest score.
But that’s not the point.
So now that we have the model and we know it works, let’s save and load with Pickle.
Model Persistence with Pickle.
Pickle is native to Python 2 and 3, so no need to pip install.
We first come up with a filename we want to save it as and then initiate the dump() method with pickle to write our model object (km, from above) into binary.
Note here that ‘wb’ is important, it will not work with ‘w’ (write only).
It needs to be converted into a binary form.
The filename is arbitrary, and so is the “.
sav” extension I used.
It can be anything, it does not need to have an extension either.
Name it at your convenience and what works for your purposes.
Huzzah, it’s off the notebook!Congrats.
You have successfully saved your first model for future use.
You can call it back in a future notebook with this load command, or for our demonstration purposes, we can do it in the same notebook.
We use the load() method with pickle and the parameter ‘rb’ for read-binary.
Now, km_loaded_pickle is our km model object from before.
It should have the same attributes and methods as it had before.
It works!Another neat thing — the loaded model maintains all the model’s methods.
You will not need to import the library again for it to run its methods, such as running a prediction.
Neat!Model Persistence with JoblibThe process for model persistence with Joblib is more-or-less the same, but slightly easier in my opinion.
However, you will need to import the sklearn.
We set the filename in much the same way as before and perform a joblib.
dump on the km model and using our just-defined filename.
Boom, saved the same model, using two methods already!.We’re on a roll here.
Let’s close out strong.
Now load it back up with joblib.
load() using our filename from before.
There we have it, loaded it back up and used its attributes.
Like before, we could also run the model’s methods such as predict() without needing to import the library.
It’s all built into the object.
WORD OF WARNING: Do not load any binaries with pickle or Joblib that you have not made yourself or absolutely trust.
Neither is protected or secured against malicious code.
Photo by Japheth Mast on UnsplashThere you have it.
The very first step of a jumping off point to final deployment.
I hope to get each successive part of the series done a week apart.
Next, I imagine we’ll probably be getting all of this into a docker container.
If you have any questions, comments, concerns, please let me know, or how I can steer the series into something helpful we can all use.
Until the next part, happy coding!.. More details