Qwak now supports deploying machine learning (ML) models with event-driven streaming architecture using Apache Kafka to support high-throughput predictions.
This new capability allows data scientists to deploy with a click ML models as an endpoint to receive data as stream and output predictions as streams.
Real-time ML inference at scale has become an essential part of modern applications. Although we started our deployment service with support of real-time predictions based on a web server, we do see among our customers a high demand for streaming-based ML predictions.
Streaming inference is useful in the following cases:
Once a model is deployed to Qwak using the Streaming option, the deployed model will be triggered when the producer topic receives features/inference requests, and then it pushes the prediction to a consumer topic.
Once the model is deployed, you can track the service health metrics and be alerted if the metrics are above/below certain thresholds, such as error percentage, average throughput, consumed messages, consumer lag, processing lag, and errors over time.
Choose the number of pods and CPU/memory size of the pod, and then add the address of the Bootstrap server and consumer/producer topic names.
qwak models deploy stream \
--model-id "demo_stream" \
--build-id "my_build_id" \
--consumer-bootstrap-server "kafka-bootstrap-server.svc.cluster.local" \
--consumer-topic "in-topic" \
--consumer-group "consumer-group-example" \
--consumer-auto-offset-reset latest \
--consumer-timeout 60000 \
--producer-bootstrap-server "kafka-bootstrap-server.svc.cluster.local" \
--producer-topic "out-topic" \
--producer-compression-type gzip \
--workers 2
Qwak streaming deployment is perfect for event-based predictions that require high throughput, low latency, and fault tolerant environments.
Get started for free today!