MLOps

Machine Learning Model Deployment Strategies

Machine models need to be frequently deployed to support changes such as data drift, data decay and new target audiences. A deployment strategy is a way to update an existing ML model deployed in production and support the continuous testing and management of multiple models simultaneously.

Pavel Klushin

Head of Solution Architecture at Qwak

September 11, 2022

Contents

Machine Learning Model Deployment Strategies

Introduction

Machine models need to be frequently deployed to support changes such as data drift, data decay and new target audiences. A deployment strategy is a way to update an existing ML deployed model in production and support the continuous testing and management of multiple models simultaneously.

Qwak’s Support

Qwak now supports a variety of deployment strategies to enhance model performance, accuracy and flexibility.

Canary deployment - rolls out a new version of the model incrementally, allowing the new version alongside the old one; this allows time to test the new version before scaling it out completely.
Shadow deployment - has one current model version deployed and live to serve the current requests while another acts as a test model that contains the new version to perform tests and ensure expected performance and functionality.
A/B testing deployment - there are different versions of the model deployed to production and Qwak allows specific model assignment by segments such as users, verticals or others. Compare the performance results between them to select the model that performs the best.

Using ML model deployment strategies with Qwak

Let’s see an example of how to create an A/B deployment where users from New York & California will receive different versions of the Model.

Step 1: Configure two separate audiences using a Yaml file

In our case we will start with “New York” audiences

api_version: 
v1spec: 
audiences: 
- name: California        
description: users from california          
conditions:            
unary:                
- key: location                    
operator: UNARY_OPERATOR_TYPE_EXACT_MATCH                     
operand: california

Step 2: Run and apply using Qwak CLI:

qwak audience create -f “california_audience.yaml”

Step 3: Deploy a build (specific model version) to the California audience

Model Deployment screen — Choose the build you want to deploy for the California audience

Step 4: Create a variation

‍

Step 5: Connect the audience to the variation and deploy

‍

Once both models are deployed you can see them in the overview dashboard

Prediction example using Python Client

The metadata configuration specifies which model version to create an inference request for

from qwak.inference.clients import RealTimeClient
feature_vector = [{'User_Id': 166056434}]
client = RealTimeClient(model_id="real_time_churn_model")
print(client.predict(feature_vector), metadata={"location": "california"})

As you can see, we now have two models of different versions creating predictions for two separate audiences, one in New York and the other in California. Naturally using the variety of deployment strategies a user can create a robust deployment and a productionized flow to support model testing and production requirements.