This article serves as a comprehensive guide for integrating Qwak's Feature Store with the Snowflake Data Cloud. The guide covers essential prerequisites, setting up a connection to Snowflake, defining data sources and entities, and consuming features in both batch and real-time machine learning models.
Machine learning models need well-organized, high-quality features for training and making predictions. Qwak's Feature Store is a central place that makes it easier to go from raw data to usable machine learning features. This guide shows you how to use Qwak's Feature Store with the Snowflake Data Cloud, a global network where organizations mobilize their data and apps, put AI to work, and collaborate across teams, to manage and serve your machine learning features effectively.
A Feature Store is a centralized system designed to store, manage, and serve machine learning features. It addresses common challenges in machine learning, such as feature consistency, reusability, and real-time serving.
Qwak's Feature Store stands out by offering seamless integration with the Snowflake Data Cloud. It allows you to transform raw data from Snowflake, manipulate it, and store it for use as offline or online features in machine learning models.
Overall, Qwak's feature store provides a powerful solution for managing machine learning features, enabling organizations to build more accurate and effective machine learning models.
Before diving into the integration process, make sure you have the following set up:
For this tutorial, we'll be fetching data from a Snowflake table with the following schema:
To connect to Snowflake, you'll need to define a SnowflakeSource object. This object specifies the connection details and the table to be queried.
Secure Storage of Credentials
Qwak offers a secret service that allows you to securely store sensitive information like usernames and passwords. This ensures that your credentials are encrypted and managed securely.
Defining Entities
Entities are business objects that you want to make predictions about. In this example, we define a user entity.
To test the datasource, you can use the get_sample() method to retrieve sample data. This method will automatically test the connection to the Snowflake table, as well as validate the data by retrieving a sample with the first 10 rows.
The result should be:
Once you've verified the data sample, the next step is to register the Data Source and Entity in Qwak. This can be done effortlessly using Qwak's CLI as shown below:
The `-p` dictates the file where Qwak should look for Feature Store definitions.
The command output should be something like this.
Now you can not only test it locally, but can also see it in your Qwak dashboard and call it in your next FeatureSets.
In Qwak, feature sets are either SQL queries or Python functions designed to transform raw data into usable features. These feature sets can be scheduled for regular updates and can also be backfilled to generate historical features.
When defining a FeatureSet, consider the following components:
Scheduling and Backfilling
You can schedule a FeatureSet to run at regular intervals using cron scheduler syntax. For instance, in this example the `user-features` FeatureSet is set up to fetch new data every day at 8:00 AM.
The backfill option is used only when registering the feature set. It tells Qwak how far back in time to fetch historical data for the FeatureSet.
Finally, user_churn_features is a method that returns an SQL based transformation. This helps you filter, transform, and customize the FeatureSet's schema and data.
Because we already registered the Entity and DataSource, we can now query a sample for this FeatureSet to validate it works as expected.
And the sample should look something like the following:
As with the DataSource, registering a FeatureSet is straightforward:
The `-p` dictates the file where Qwak should look for Feature Store definitions.
Once you've set up the FeatureSet, you should see it reflected in the Qwak Dashboard. At this point, the data ingestion and processing pipeline should have already kicked off.
By registering the FeatureSet, Qwak stores the resulting data in two types of stores: an Offline Store and an Online Store.
Most modern ML models are trained in batches, often referred to as offline training. In this section, we'll demonstrate how to consume features from Qwak's Offline Feature Store for model training.
To retrieve features from the Offline Store, you'll use Qwak's OfflineClient. This requires a key-to-features mapping dictionary, along with start and end datetime values to specify the data fetching range.
The key_to_features mapping dictionary should follow this format, where the listed features are the ones used for model training or prediction:
Running the code snippet above will return the following features sample:
For real-time predictions, latency is a critical factor. In such cases, you should use Qwak's Online Store for feature retrieval.
The OnlineClient serves as the query interface for Qwak's Online Store, offering fast feature retrieval.
To use the get_feature_values method, you'll need to specify two things:
The output should look something like this:
This section could address common issues that users might encounter and how to resolve them. For example:
If your data ingestion pipeline fails, the first step is to consult the logs for clues about the failure. Navigate to the 'Feature Set Jobs' section in the Qwak Dashboard, as shown below.
If you find that the Offline or Online client isn't retrieving any rows for a given key, you can verify the data in the Qwak UI under the 'FeatureSet Samples' section using an SQL query. For more detailed troubleshooting steps, refer to our documentation.
Note: When constructing your query, make sure to enclose column names in double quotes and prefix them with <feature-store.feature>, as shown in the example below.
In this comprehensive guide, we've walked you through the process of integrating Qwak's Feature Store with Snowflake to manage and serve machine learning features effectively. From setting up prerequisites to defining entities and feature sets, we've covered all the essential steps. We also delved into the specifics of consuming features for both batch and real-time machine learning models.
By now, you should have a solid understanding of how to leverage Qwak's Feature Store in conjunction with Snowflake's data warehousing capabilities. Whether you're looking to fetch features for offline batch training or need low-latency feature retrieval for real-time predictions, Qwak's dual storage system has you covered.
Thank you for reading, and we hope this guide empowers you to build more accurate and efficient machine learning models.
Learn more about how to build a fully blown ML application with your Snowflake data.