ML Models — Prototype to ProductionShreya GhelaniBlockedUnblockFollowFollowingMay 8So you have a model, now what?Through the powers of machine learning and the promise of deep learning, today’s conferences, thought leaders and experts in ML and AI have been painting a vision of businesses powered by data.
However, despite the groundbreaking research and the constant flood of new papers in the fields of ML and deep learning, much of this research remains just that — research (outside of the few tech giants).
Deploying ML models remain a significant challenge.
As a data scientist and an ML practitioner, I have myself experienced that it is often more difficult to make the journey from a reliable and accurate prototype model to a well-performing and scalable production inference service than it is to actually build the model.
Models need to be retrained and deployed when code and/or data are updated.
Therefore, automating the build and deployment of machine learning models is a crucial part of creating production machine learning services.
The deployment and operational aspects of “productionizing” ML models lie at the intersection of various practices and disciplines like Statistical modeling, data science, DevOps, ML engineering, etc.
For this reason, it does not fall within the realm of expertise of any one single discipline.
Moreover, there are other considerations for a production-traffic serving model outside of the choice of a deployment stack, such as the need for continuous improvement and deployment, security concerns, performance and latency aspects, the ability to support rapid experimentation and A/B testing, allowing for auto-scaling etc.
In this post, I will describe an approach for automating the build and deployment of ML models using AWS Sagemaker and AWS Step Functions.
AWS SageMaker is a complete machine learning (ML) workflow service for developing, training, and deploying models.
It comes integrated with Jupyter notebooks for data analysis, exploration and model experimentation.
It offers flexible distributed training options and model hosting services for model deployment in a secure and scalable environment through https endpoints.
SageMaker comes with many predefined algorithms.
You can also create your own algorithms by supplying Docker images, a training image to train your model and an inference image to deploy to a REST endpoint.
What is Docker, anyway?Docker is an open source project based on Linux containers.
It is a tool designed to make it easier to create, deploy, and run applications using containers.
Containers allows you to package up an application with all of it’s libraries and other dependencies, and ship it all out as one package.
Docker containers are very lightweight and fast.
Fundamental Docker ConceptsDocker Build ProcessA Dockerfile is where you write the instructions to build a Docker image.
These instructions can be installing software packages, setting environment variables, paths, exposing networking ports etc.
Once the Dockerfile is set up, the ‘docker build’ command is used to build an image from it.
Docker Images are read-only templates that you build from a set of instructions from your Dockerfile.
Images define what you want your packaged application and its dependencies to look like and what processes to run when it’s launched.
These read only templates are the building blocks of a Docker container.
You can use the ‘docker run’ command to run the image and create a container.
Docker Container is a running instance of a Docker Image.
These are basically the ready applications created from Docker Images.
Docker Images are stored in the Docker Registry.
It can be either a user’s local repository or a public repository like a Docker Hub which allows multiple users to collaborate in building an application.
SageMaker Model Deployment Architecture OverviewThe following steps are involved in deploying a model using SageMaker -Build the docker image for training and upload to ECR (Elastic Container Registry) This image holds your training code and dependencies.
Create and start SageMaker training job (SageMaker CreateTrainingJob API)Build the docker image for serving (inferencing) and upload to ECR (Elastic Container Registry) This image holds your inference code and dependencies.
Create SageMaker Model (SageMaker CreateModel API)Create/update SageMaker endpoint that hosts the model (SageMaker CreateEndpoint API)Sagemaker Architecture BYOM (Bring Your Own Model) ApproachSageMaker algorithms are packaged as Docker images.
This gives you the flexibility to use almost any algorithm code with SageMaker, regardless of implementation language, dependent libraries, frameworks, and so on.
You can use your own custom training algorithm and your own inference code.
You package the algorithm and inference code in Docker images, and use the images to train a model and deploy it with Amazon SageMaker.
Training — When Sagemaker creates the training job, it launches the ML compute instance, runs the train docker image which creates the docker container in the ML compute instance, injects the training data from an S3 location into the container and uses the training code and training dataset to train the model.
It saves the resulting model artifacts and other output in the S3 bucket you specified for that purpose.
Deployment — For model deployment, Sagemaker first creates the model resource using the S3 path where the model artifacts are stored and the Docker registry path for the image that contains the inference code.
It then creates an HTTPS endpoint using the endpoint configuration which specifies the production model variant and the ML compute instances to deploy to.
The client application sends requests to the Sagemaker HTTPS endpoint to obtain inferences from a deployed model.
A complete Model Deployment FrameworkLet’s look at how we can design an end-to-end model deployment pipeline using SageMaker and a host of other different AWS services with AWS Step Functions used as the main co-ordinator and workflow orchestrator.
AWS Step Functions are used for workflow orchestration.
Using Step Functions, you can design and run workflows that stitch together multiple AWS services such as AWS Lambda, Amazon ECS etc.
Step Functions also translate your workflow into a state machine diagram for an easy visual representation and monitoring.
Below is a high level overview of how the services play together -Step functions act like a state machine, beginning with an initial state and transforming the state using AWS Lambda functions — changing, branching or looping through states as needed.
AWS Lambda functions are used for starting model training, image building, checking on train and build status and so on.
AWS CodeBuild is used to build docker images and push them to an Elastic Container Registry (ECR) repository.
AWS Systems Manager Parameter Store provides a shared, centralized parameter store for our training and deployment jobs.
AWS Lambda functions query the parameters from this store.
AWS Simple Notification Service (SNS) is used for starting builds and for notifications.
AWS Simple Storage Service (S3) buckets are used to hold model training data and trained model artifacts.
A Github/CodeCommit repository publishes to an SNS topic when a code change is made.
Notifications are also published to an SNS topic when the build has started, finished, and failed.
The following diagram shows how the services work together.
Automatic Build and Deployment with AWS Sagemaker and Step Functions ArchitectureIn conclusion, SageMaker and Step Functions, in conjunction with other AWS services, can provide a robust, functionality-rich, end-to-end deployment framework — continuously training, building, and deploying ML models.