Hyperparameter tuning is an optimization problem which means the exact nature is unknown and expensive to compute so it would be beneficial to go about it in an informed way.
Bayesian optimization is an extremely powerful technique when the mathematical form of the function is unknown or expensive to compute.
The main idea behind it is to compute a posterior distribution over the objective function based on the data (using the famous Bayes theorem), and then select good points to try with respect to this distribution.
SourceThat means that as as more trials run and the posterior distribution begins to improve, the hyperparameter values which are most fruitful start to emerge.
So by making informed decisions for future hp values based on what the model has learned, we can speed up the process of finding the best fit.
To learn more about Bayesian statistics check out this book.
Photo by Jose Antonio Gallego Vázquez on UnsplashThere’s 4 main parts to submitting a hyperparameter tuning job on GCP:A training file which includes the usual code with the model you want to use and the score you want to evaluate the model on.
It also includes an argument parser to add in the hp values for the different parameters you want to tune for each trialA yaml file with the hyperparameter values you want to use in the training fileA shell executable file to submit the training jobA setup file to install the additional dependenciesIf you’re not familiar with working in the cloud or on GCP specifically, there are a few extra steps, but I think they are worth it and once you have the framework set up, it’s easy to adapt it to different models.
The model I’m using as an example here is a multiclass text classification model from one of my previous articles.
The model isn’t really important here but it’s Stochastic Gradient Descent and before that there’s feature creation (tf-idf) and dimensionality reduction (LSA).
You can tune the hyperparameters for those as well if you want.
The training data has already been cleaned up and here’s what it looks like:The classes are perfectly balanced with 1,890 examples each.
Let’s take a closer look at each file we’re going to need to run our training job.
Training FileAt the top is the argument parser.
This will feed in the hp values for each trail to the estimator from the yaml file.
The hp values for the model will be passed in by the argument parser from the yaml file like so:Then we have to download the training data from Google Cloud Storage.
From there the text goes through the tf-idf vectorizer, we define our target and features and then we reduce the dimensions of our target.
Then the usual train-test-split and then we fit the training data to the classifier and predict on the test data.
After that we define our score which is just going to be accuracy in this case since we have balanced classes.
The next bit is calling the hypertune library where we set our metric and the global step that the metric value is associated with.
After each trial, the model is saved in the GCS folder you specify.
Hyperparameter Configuration FileIn the first part of this file, you specify the goal which in our case is to maximize the accuracy.
If you were trying to optimize based on something like RMSE, then you would want to set this to minimize.
Then you set the number of trials you want to run.
The more trials the better but there definitely is a point of diminishing returns.
They recommend setting the number of trails to at least 10x the number of parameters you have which would be 50 in our case.
Next you have to specify the number of concurrent trials you want to run.
Running trials in parallel reduces the time it takes to run but can also reduce the effectiveness when using Bayesian Optimization.
This is because it uses the results of previous trials to inform the hp values for subsequent trials.
Then you have to pass in the hp metric tag and finally you can also enable early stopping which will stop a trial if it’s obvious it’s not going to be fruitful and save time.
After that, you define the min/max value or different types/categories for each hp which is pretty straight forward and essentially what you are doing in a grid search.
Then you have to define what you want to happen for each parameter.
When you have a range of values you want to explore, you can either pass in discrete values like I’m doing here for ‘n_components’ or give a min/max range and have it go through and scale the values to search through linearly.
They also have a log scale and a reverse log scale option you can use if the space you’re searching through is very large.
For categorical one’s you have to pass those discreet values.
Optionally you can also specify a search algorithm.
If you don’t it defaults to Bayesian Optimization.
The other choices are grid search or random search so you can do those as well.
This is also where you can resume an earlier trial if you think more trials can be worth it by using ‘ resumePreviousJobId’ field and passing in the job id.
Setup FileThis simply downloads the hypertune library dependency we need.
If you need other dependencies you cannot download directly to AI Platform you can add them here.
Here the only thing we need is the hypertune library.
Shell ScriptThis contains the variables and gcloud commands we need to submit the training job.
In terms of the variables, the ‘bucket name’ is the bucket you want the models to be downloaded to and the ‘job directory’ is the folder within the bucket where you want the models saved to.
It’s good practice to add in a timestamp to the ‘job name’ for future identification.
The ‘training package path’ is the folder you saved your files to on GCP and the ‘main trainer module’ is the file with the model and argument parsers.
You also need to set the runtime version for the AI Platform, the python version, the region, the scale tier as well as point to where the hp config yaml file is.
Then the actual gcloud commands to run the training job which use the above variables we defined.
The last part is an optional command if you want to stream the logs to the console but you can certainly close the shell at that point and do something else.
Putting Everything TogetherOpen up the code editor which can be be accessed on GCP by opening the shell and clicking on the ‘pen’ icon on the bottom right:GCP has a built in code editor and any code here can easily be pushed to a Cloud Source Repository for version control.
When you open it up, you’ll see that you already have a folder with your username.
Create a training job folder and name it whatever you want.
Within the main training job folder create a sub folder for the training file and the yaml config file.
You also need to add an empty ‘__init__’ file and this is also where I put the shell script.
In the main training folder is where you put the setup file.
Next, upload the training data to the Cloud Storage bucket.
You can just do this through the UI.
The bucket name I’m using is ‘training_jobs_bucket’ but you can use any name you like.
Now we’re ready to run the training job.
Assuming you have everything set up to run a training job on AI platform, you can just open the shell and enter the following commands to run the shell script:chmod +x training_job_folder/trainer/submit_hp_job.
Your job will now run and you can check the output in the AI Platform section of GCP.
It will show each trial sorted highest to lowest by accuracy and the hp values for that trail.
AI Platform also has an api and you can get the results from there as well.
That way you can pass the best parameters directly into your production model or you can analyze the results of experiments easily.
Here’s an example of how to do that:For the credentials, I’m using a service account and providing the path to the credentials json file.
Then just provide the project id and the job name.
Then make the request.
The first object returned will be the best hyperparameters.
I ran three different trails with three different standard machine types to compare (basic, standard and premium).
Besides these three predefined tiers, they also have machines with GPUs and TPUs and you can always create a custom machine.
50 trials were run with the same hp values, early stopping was enabled and I didn’t run any concurrent trials.
Running concurrent trials as mentioned above would have sped up the process but I that’s not really what I cared about here.
In the end, not surprisingly all three runs yielded a top accuracy score of around 86%.
The formula to calculate that cost of one of these jobs is:(Price per hour / 60) * job duration in minutesBasic : (.
19 / 60) * 432 = $1.
988 / 60) * 194 = $6.
5536 / 60) * 222 = $61.
25Interesting that the premium tier actually took longer that the standard one and cost almost 10x as much.
There could be a couple reasons for this, I had early stopping enabled so there could have been some trials in the standard run that were stopped early but not in the premium run.
I’m also not running concurrent trials so I don’t know how much of an advantage standard would give over premium in that case.
Premium has a larger master machine compared to standard (16 vCPUs vs 8 vCPUs), but the real difference is in the number of workers (19 vs 4) and parameter servers (11 vs 3).
So you should explore this for yourself and you shouldn’t just assume that premium will give you the fastest results, it really depends on what you are trying to achieve.
ConclusionThere’s a little more work you need to do and a little bit of a learning curve to use the hyperparameter service, but overall I think it’s worth it and will be using it myself going forward.
A lot of the stuff like implementing Bayesian Optimization and logging results is being handled for you.
I really like the Bayesian approach to tuning and if this saves me time while training lots of data or running lot’s of experiments than the extra effort here is definitely worth it to me.