What we learned by serving machine learning models at scale using Google Cloud ML

What we learned by serving machine learning models at scale using Google Cloud MLDaitanBlockedUnblockFollowFollowingFeb 19By Bruno Schionato, Diego Domingos, Fernando Moraes, Gustavo Rozato, Isac Souza, Marciano Nardi, Thalles Silva — Daitan GroupFollowing our series of articles about cloud infrastructures for solving the Machine Learning (ML) pipeline problem, this time we gave Google Cloud ML a try.

We’ll also provide a comparison between Amazon SageMaker and Google ML.

We emphasize their differences and similarities and provide load testing performance results.

Let’s dive into it.

Google Cloud Machine LearningGoogle ML is a managed service that enables developers to build and deploy ML models to production.

It offers a pipeline that aims to solve the machine learning problem end-to-end.

That is, it provides services that help from the most fundamental tasks, like data collecting and cleaning, to more advanced ones — like training and deploying at scale.

Moreover, its flexibility allows users to use its services in conjunction or individually.

In other words, you can take a pre-trained model and use the Cloud ML services to deploy it in the cloud.

Like Amazon SageMaker, Google ML aims to minimize, as much as possible, the necessity of a data scientist specialist.

Indeed, as we will further discuss, Google ML abstracts most of its configurations.

However, though this trade-off may offer an easy-to-use platform, it might also reveal some limitations for certain types of users/applications.

Currently, a lot of companies are betting on this type of end-to-end cloud ML service.

As a consequence, Google competes with big companies like Amazon (SageMaker) and Microsoft (Azure ML).

To have an idea about the level of competition, take a look at the following picture.

Level of interest (in time) for search terms (in the US) like “Google ML”, “Amazon SageMaker” and “IBM Watson”Although Google ML is not the leader, we have seen an increase in adoption of its services among our clients here at Daitan.

For this reason, Google positions itself as an important player in this emerging market scenario.

Training CapabilitiesAs we saw in a previous post, Amazon SageMaker offers many built-in, optimized versions of ML algorithms.

Also, training and deploying these models (at scale) is relatively easy and demands very few lines of code.

In addition, SageMaker allows a great deal of customization for either training and deploying.

These are some of the main features offered by the Amazon tool:Configurable number of machines (jobs) for training and deployingAutomatic scaling of SageMaker modelsCreation and configuration of scaling plansIntegrated Jupyter notebook for developmentGoogle ML does not offer built-in implementations of classical ML algorithms.

In fact, it focuses on integrating its services with pre-existing ML libraries.

Put it differently, it tries to ease the process of training and deploying ML models using your library of choice.

These libraries include scikit-learn, Keras, XGBoost, and of course, Tensorflow.

To be fair, SageMaker also has this capability.

Nevertheless, in the Amazon tool, to load and deploy pre-trained models, one needs to deal with managing containers.

Not necessary on Google ML.

Google ML offers some advantages over its main competitor.

First, it provides a nice way to deal with different model versions.

Second, it provides online and batch predictions — which may offer huge benefits on highly demanding services.

It is important to note that both services are paid.

SageMaker, for instance, charges an extra fee (for using its libraries) besides the regular costs from other services.

Google ML, on the other hand, does not charge additional fees.

You only pay for the machines you request for training and deploying.

Plus, it provides a generous $300 dollar quota for starters.

For more info on prices, take a look at Google ML prices.

Google also offers a calculator for price estimations.

In summary, both platforms support training and deploying using their respective infrastructures.

Still, when it comes to serving, the differences between the two start to increase — as we will see next.

Training on Google MLHere, we used the same version of our intrusion ML model from the SageMaker article.

For training, Google ML and SageMaker are very similar.

Both platforms offer a reasonable set of machines for the job.

To run a training job on Cloud ML, one needs to specify the number and types of machines.

To ease the process, Google provides a set of pre-defined cluster specifications called scale tiers.

In our case, we chose a single worker instance (the basic scale tier).

This tier has a simple Compute Engine machine name: n1-standard-4.

According to Google, this single worker is suitable for training simple models with small to moderate datasets.

This machine has 4 virtual CPUs and 15GB of memory.

However, the range of possibilities goes far beyond this simple option.

To have an idea, these are some of the available compute scale tiers:STANDARD_1- One master instance, plus four workers and three parameter servers.

BASIC_GPU- A single worker instance with a single NVIDIA Tesla K80 GPU.

BASIC_TPU(beta)-A master VM and a Cloud TPUAnd these are some of the machines available in the tiers.

n1-highmem-8: A machine with a lot of memory (52GB of RAM) with 8 virtual CPUs.

Especially suited for large models.

n1-standard-8: Contains 8 Virtual CPUs, 30GB of RAM and an NVIDIA Tesla K80 GPU.

n1-standard-4: 4 virtual CPUs with 15GB of memory and Cloud TPU support.

(As of this writing still in beta mode)To have a complete look, head over to Google Cloud Machine Types.

Another important point when choosing one of these instances is the price.

Bellow, we’ve listed the prices, along with the machine’s computing power.

Note that the prices are based on location.

You can access the Google prices page here.

DeploymentWhen it comes to deploying and scaling ML models, SageMaker and Google ML take distinct approaches.

Just like for training, the Amazon service provides a richer set of machines for deployment.

Google ML, on the other hand, provides a very limited number of machines for deployment.

In fact, by the time of experimentation, only two options were available by default: (1) a single or (2) a quad-core machine.

According to Google’s documentation, the single core machine has 2GB of RAM.

Google suggests joining its alpha program for more hardware options.

Another big difference between the two services is the auto-scaling feature.

SageMaker allows many possibilities for customizing Auto Scaling based on one’s needs.

With Google, we were not able to adjust any scaling metric.

Basically, Google ML offers only two possibilities:Manual scale: Here, one can set the number of nodes that will always be up;Auto Scale: Google will do the job for you.

It will decide when and how to scale if necessary.

You have no control.

Results and DiscussionWe used Taurus and JMeter to write and perform load testing of our ML model.

We considered 2 test scenarios.

Test Scenario #1 (baseline)To begin, our basic load test aims to validate the model scalability using Google services.

It includes the following configuration.

The test lasts a total of 5 minutes.

In the first 3 minutes, we ramp-up 9 users using a step size of 3.

Then, we hold the current 9 users for 1 minute and jump to 18 users for the last minute.

Test Scenario #2Here, we explore a heavier load testing configuration.

We wanted to see how the Google infra scales (upon many requests) and if any machine would throttle under such circumstances.

In this scenario, for 5 minutes, we ramp-up 90 users (instead of 9) using a step size of 30.

We hold the 90 users for 1 minute (like in the baseline) and jump to 180 parallel users for the last minute.

Test ResultsFor better understanding:The blue line represents the number of users through time.

The yellow line shows the average response time.

The pink line shows the max response time.

The red one, the number of errors (if any).

The green line is the number of requests.

The first set of images shows the performance test results for the first test scenario.

The two charts are for the single and quad-core machines respectively.

As expected, the number of hits increases as the number of parallel users grows.

The average number of hits per second throughout the test was 12.

8 with a mean response time of 696 ms.

Curious enough, the tests using the quad-core machine did not show any improvement.

For the second test, however, using the 2 machines, we come up against some limitations of the current Google platform.

Indeed, what most claims our attention is the high number of request errors.

It turns out that all these requests returned the 429 error status code — Too many requests.

Digging a little deeper, we found that Google (by default) does not allow more than 100 requests per second on its APIs.

And again, if one wishes to increase this number, you need to contact Google by filling out a form and sending it to them.

This contrasts with our SageMaker results, in which we could achieve ~1k requests per second — almost 10 times the maximum default by Google.

Of course, we expect Google to allow for more scalability, but filling out a form to request machines is something that doesn’t sound like cloud elasticity.

In short, the tests using the Google platform fell short to Amazon SageMaker’s results.

Although it is not a fair comparison, (mainly because of the differences in the machines), the results highlight the flexibility of the Amazon service in a self-service model.

The ability to build larger clusters of machines without contacting Sales is an important feature that Google lacks at the moment.

Yet, Google makes a good case if you want to deploy a pre-existing model on its platform.

Also, the self-managed Auto Scale feature may be very welcome to inexperienced developers that only want to deploy their system.


. More details

Leave a Reply