Supporting streaming deployments with Qwak
Qwak now supports deploying machine learning (ML) models with event-driven streaming architecture using Apache Kafka to support high-throughput predictions.
This new capability allows data scientists to deploy with a click ML models as an endpoint to receive data as stream and output predictions as streams.
Real-time ML inference at scale has become an essential part of modern applications. Although we started our deployment service with support of real-time predictions based on a web server, we do see among our customers a high demand for streaming-based ML predictions.
Streaming inference is useful in the following cases:
- When the inference requests should triggered by an already existing stream of messages
- When you would like to decouple the caller from the model
- When you need to handle with prediction service failures because of high prediction traffic
How it works
Once a model is deployed to Qwak using the Streaming option, the deployed model will be triggered when the producer topic receives features/inference requests, and then it pushes the prediction to a consumer topic.
Once the model is deployed, you can track the service health metrics and be alerted if the metrics are above/below certain thresholds, such as error percentage, average throughput, consumed messages, consumer lag, processing lag, and errors over time.
Getting started with Qwak Streaming deployment
Using Qwak Management console
Choose the number of pods and CPU/memory size of the pod, and then add the address of the Bootstrap server and consumer/producer topic names.
Qwak CLI command
Qwak streaming deployment is perfect for event-based predictions that require high throughput, low latency, and fault tolerant environments.
Get started for free today!