Well, it is… and it’s not.
The obvious approach is just to iterate over all possible combinations of hyperparameters, run a grid search.
The problem with that is that usually there is not enough time in the universe to run it all.
A better idea is to sample a limited number, say 100, randomly selected hyperparameter sets and try them all.
In practice, this is usually good enough.
But if you think about it, it is far from perfect.
Some parts of the hyperparameter search space are just not worth your (and your GPU) time.
Instead of selecting all 100 hyperparameter sets at the beginning, you could make a more informed decision, and select a set of hyperparameters based on the information you have at hand after every run.
This is what people usually refer to as “Bayesian hyperparameter optimization” and it will be the primary focus of this series.
In this series, we will look at the major python HPO libraries, learn what they offer, how to use them and which one can get you the best results.
About this studyEvaluation criteriaWe need some dimensions to compare those libraries and I decided to go with:Ease of use/setup and APIdocumentation and examplesoptions/methods/(hyper)hyperparameterspersistence/restartingspeed and parallelizationvisualizationsexperimental results**those should not be treated as a machine learning benchmark.
I wanted to make sure that I can run those HPO experiments for different libraries on a simple problem and get some sort of intuition for the results.
I will try to subjectively score each category from 0–10.
For the experimental results, I will calculate the gain over random search strategy and use it as a score.
Please comment if you agree/disagree with my judgment, we can definitely get a better guesstimate together!LibrariesGoing by popularity, I took the following libraries for a spin:Scikit-OptimizeHyperoptBayesianOptimizationOptunahpbandsterSherpaSMAC3If I am missing your favorite library, let me know, I will add it later!Example problemAs an example let’s tweak the hyperparameters of the lightGBM model on a simple, binary, tabular dataset.
To make the training quick I fixed the number of boosting rounds to 300 with a 30 round early stopping.
All the training and evaluation logic is put inside thetrain_evaluatefunction.
We can treat it as a black box that takes the data and hyperparameter set and produces the AUC evaluation score.
Moreover, it makes it easy for you to follow this blog post with your own tabular data.
You can change the evaluation metric too if you want to.
I took a dataset from the recently finished Kaggle competition and took the first 10000 rows from it.
To train a model on a set of MODEL_PARAMS you can run something like this:Tracking experiments and resultsI like to keep track of the experiments I run.
Being able to come back to them at any time and share them with anyone makes me feel that this work is not lost.
For every experiment I tracked:scripts,hyperparameters,diagnostic charts,results objects (as pickles),best hyperparameters,AUC metrics,with Neptune.
You can simply go to the blog-hpo project, explore all this information for every experiment and even download the results.
Other notesA few other things that I feel you should know:I ran all the experiments on linux 16.
04,Worked on 12 cores,Training budget was 100 iterations total.
Now, that we have the prerequisites out of the way, we can finally start tweaking those hyperparameters.
What’s next?If you want to take a look at the first library, check out Part1: Scikit-Optimize.
.. More details