In today's data-driven environment, the need for streamlined ML engineering solutions is paramount. The integration of platforms such as Qwak, specifically designed to reduce the complexities associated with ML model lifecycle management, and the Snowflake Data Cloud, a global network where organizations mobilize their data and apps, put AI to work, and collaborate across teams, allows businesses to effectively leverage their data assets for machine learning endeavors.
This blog dives deep into the nuances of building an ML model managed by Qwak, utilizing data housed in a customer's Snowflake environment. Readers will gain insights into best practices, seamless integration techniques, and the benefits of combining these two powerhouse platforms.
In the first example, we will detail a full-fledged batch execution pipeline which utilizes a few powerful features Qwak offers, such as model versioning and tagging, conditional execution, and more.
Qwak is an ML engineering platform that simplifies the process of building, deploying, and monitoring machine learning models, bridging the gap between data scientists and engineers.
For the sake of this example we will use the example churn model, detailed in the this Qwak example. If you have a Qwak account, which you can access a free-trial version of it here,
just run the following example in the CLI to create a trained model instance (aka a Qwak build) of the churn model:
The example script is comprised of 3 steps:
First let’s see the full example of a working pipeline, and then we’ll break it down to steps with a detailed explanation on each section:
The first phase is not Qwak specific at all. Here we connect and query a Snowflake for the inference data relevant to the current batch execution. Notice that in most cases at least the where clause will be parameterized, especially if the script is scheduled and managed by some orchestration tool.
Every Qwak build can log its own list of parameters and metrics. The interface is as easy as defining `qwak.log_metric("f1", 0.9)
And can be viewed in a list for comparison, or each build individually:
In our case, we fetch the build ID we wish to run a batch execution against programmatically using the QwakClient utility.
There are many useful patterns for this approach. For example, some of our customers do a “Model Per Dimension” type of execution using this mechanism to train a model per customer (same model, different datasets), tag it with the customer name - and then during batch inference programmatically fetches the build relevant for every customer and performs inference against it:
The parallelism, that is, how many tasks are running in parallel, is controlled by the executors parameter. For example, if the executors parameter is set to 5, then Qwak launches 10 tasks, with 5 tasks running in parallel.
The batch mechanism is a powerful mechanism, with many options and toggles to run clean and efficient inference pipelines. You can learn more about it here.
Throughout the above, we showed how to create an ML application which runs a batch execution on top of your Snowflake data. Notice that the above example can be orchestrated by any orchestration tool - Airflow, Prefect, and many others. Check out the second blog post in this series to see a more advanced pattern of using Qwak’s feature store (connected to the same Snowflake tables) in order to fetch data for batch executions.