Can Auto-Keras really find better models than you? — Photo by Annie Theby on UnsplashCan automated machine learning outperform handcrafted models?Testing Auto-Keras models on a real-world datasetNils SchlüterBlockedUnblockFollowFollowingJan 5Automated machine learning (AutoML) can be used to automatically find and train machine learning models.
You no longer need to create the model yourself — The AutoMl Algorithms will analyze your data and pick the best model automatically.
But how good are those models really?.Can they be compared to custom models or are they even better?.Do we never need to handpick another model again?.Let’s find out!Introduction: Auto-KerasCompanies like Google already offer AutoML Products, but with Auto-Keras there is also an open-source solution for this.
In the official getting started example, Auto-Keras is used to find the best neural architecture for the MNIST Dataset.
When I tried the example, the resulting model reached a score of ~98%.
That’s pretty impressive, so I decided to use Auto-Keras to try and beat myself on the Kaggle Titanic Dataset.
My best score for this competition is at about 80% precision, which currently puts me in the Top 900 of all competitors.
That’s a decent enough score, so let’s see if Auto-Keras can outperform me!You can find the Notebook with the complete code hereSetupI used Google Colab for this project.
To install Auto-Keras in Google Colab, simply run:!pip install autokerasIf you want to run this locally, you can just install Auto-Keras from the command line using pip.
DataFor this example, I used the dataset you can download from the Kaggle Competition.
To use the data in the Auto-Keras Model, you need to import it as a numpy array.
Because the Titanic data contains text data, we need to do some preprocessing first:This is the same preprocessing and feature engineering I did for my own titanic solution.
The preprocessing is, of course, different for each project, but if you want to use Auto-Keras, you will need to provide numpy arrays.
After preprocessing, you can load the data as train and test data:Finding the right modelAfter you got x_train, y_train, x_test and y_test for your dataset, you can use the fit method to find the best model and train it:That’s it — Two lines of code is all you need.
Everything else happens automatically.
ResultsAfter training the model, I used it to generate predictions and uploaded the predictions to Kaggle.
I got a score of 0.
This is slightly worse than my handmade model, but we need to consider that Auto-Keras is still very new.
The TabularClassifier currently only supports LGBMClassifier — which is not ideal for this dataset.
If the project is continued in the future, we can probably expect even better results.
Also finding the model was incredibly fast and easy, so I think this is a very good and promising result!Pros and consWhile it's very easy to use the resulting model to generate predictions, you lose a lot of knowledge about your model.
If you don’t actively search for it, you don’t know the architecture and parameter values of the resulting model — after all, they are chosen automatically, so you never really need them.
While this is probably a good thing for beginners and inexperienced machine learning engineers, it can get dangerous if you’re later trying to adapt or expand your model.
“Black Box” models are already heavily discussed in machine learning projects, but when using AutoML the whole process becomes even more of a black box — you just throw some data in and hope for the best.
These are some risks you need to be aware of when implementing AutoML into your machine learning projects.
Significance for future workSo, does AutoML eliminate the need for custom models?.No, probably not.
Can it help you to create better models?.Probably.
For example, you can use it to quickly find and train good base models, which you can later improve on your own.
You can also try different feature engineering strategies quickly to get a feeling which one works the best.
Just give it a try and see how you can implement it in your own projects.
I think AutoML can have a big impact on many machine learning projects in the future — I am looking forward to further developments and improvements in this area.