x
Back to blog
Supporting streaming deployments with Qwak
Announcements

Supporting streaming deployments with Qwak

By 
Pavel Klushin
November 22, 2021

Qwak now supports deploying machine learning (ML) models with event-driven streaming architecture using Apache Kafka to support high-throughput predictions.


This new capability allows data scientists to deploy with a click ML models as an endpoint to receive data as stream and output predictions as streams.


Real-time ML inference at scale has become an essential part of modern applications. Although we started our deployment service with support of real-time predictions based on a web server, we do see among our customers a high demand for streaming-based ML predictions.



Streaming inference is useful in the following cases:


  • When the inference requests should triggered by an already existing stream of messages
  • When you would like to decouple the caller from the model
  • When you need to handle with prediction service failures because of high prediction traffic



How it works

Once a model is deployed to Qwak using the Streaming option, the deployed model will be triggered when the producer topic receives features/inference requests, and then it pushes the prediction to a consumer topic.



Once the model is deployed, you can track the service health metrics and be alerted if the metrics are above/below certain thresholds, such as error percentage, average throughput, consumed messages, consumer lag, processing lag, and errors over time.


Getting started with Qwak Streaming deployment:


Using Qwak Management console:


Choose the number of pods and CPU/memory size of the pod, and then add the address of the Bootstrap server and consumer/producer topic names.



Qwak CLI command: 


qwak models deploy stream \

 --model-id "demo_stream" \

 --build-id "my_build_id" \

 --consumer-bootstrap-server "kafka-bootstrap-server.svc.cluster.local" \

 --consumer-topic "in-topic" \

 --consumer-group "consumer-group-example" \

 --consumer-auto-offset-reset latest \

 --consumer-timeout 60000 \

 --producer-bootstrap-server "kafka-bootstrap-server.svc.cluster.local" \

 --producer-topic "out-topic" \

 --producer-compression-type gzip \

 --workers 2


Qwak streaming deployment is perfect for event-based predictions that require high throughput, low latency, and fault tolerant environments. 



Get started for free today


Related articles