Back to blog
Leveraging Snowflake Data for Machine Learning with Qwak's Feature Store
Feature Store

Leveraging Snowflake Data for Machine Learning with Qwak's Feature Store

Grig Duta
September 5, 2023


This article serves as a comprehensive guide for integrating Qwak's Feature Store with the Snowflake Data Cloud. The guide covers essential prerequisites, setting up a connection to Snowflake, defining data sources and entities, and consuming features in both batch and real-time machine learning models.


Machine learning models need well-organized, high-quality features for training and making predictions. Qwak's Feature Store is a central place that makes it easier to go from raw data to usable machine learning features. This guide shows you how to use Qwak's Feature Store with the Snowflake Data Cloud, a global network where organizations mobilize their data and apps, put AI to work, and collaborate across teams, to manage and serve your machine learning features effectively.

What is Qwak’s Feature Store

A Feature Store is a centralized system designed to store, manage, and serve machine learning features. It addresses common challenges in machine learning, such as feature consistency, reusability, and real-time serving.

Qwak's Feature Store stands out by offering seamless integration with the Snowflake Data Cloud. It allows you to transform raw data from Snowflake, manipulate it, and store it for use as offline or online features in machine learning models.

Overall, Qwak's feature store provides a powerful solution for managing machine learning features, enabling organizations to build more accurate and effective machine learning models.


Before diving into the integration process, make sure you have the following set up:

  1. A Qwak account: You'll need this to access Qwak's Feature Store.
  2. Qwak SDK installed locally: The SDK is essential for interacting with the feature store.
  3. Additional Python Libraries: Install `pyarrow` and `pyathena` as they may be required for data manipulation and querying with the Qwak Client.

1. Connecting to Snowflake

For this tutorial, we'll be fetching data from a Snowflake table with the following schema:

Defining the Data Source

To connect to Snowflake, you'll need to define a SnowflakeSource object. This object specifies the connection details and the table to be queried.

# Import the SnowflakeSource class from the Qwak feature store's batch_sources module
from qwak.feature_store.data_sources.batch_sources.snowflake import SnowflakeSource

# SnowflakeSource defines a Snowflake connection and resource to be queried
snowflake_source = SnowflakeSource(
  name='users_table',                  # Usually corresponding to the table name in Snowflake
  date_created_column='DATE_UPDATED',  # The column in the Snowflake table that indicates the record timestamp
  host='https://vab31212.snowflakecomputing.com', # The URL of the Snowflake instance
  username_secret_name='my_username_secret', # The name of the secret that stores the Snowflake username. The actual secret is stored in Qwak's secret service.
  password_secret_name='risk_sf_password', # The name of the secret that stores the Snowflake password. The actual secret is stored in Qwak's secret service.
  database='QWAK_DB',                 # The name of the database in Snowflake
  schema='CHURN_MODELS',              # The schema within the database where the table resides
  warehouse='COMPUTE_WH',             # The Snowflake warehouse to use for compute resources
  table='USERS_TABLE_PROD_V2'         # The specific table in Snowflake to use as the data source

# Entities uniquely identifies a business entity in the feature set
entity = Entity(
  description='A User ID'

  • date_created_column: This column serves as the timestamp that Qwak uses to filter data. It's assumed that the data source employs SCD Type 2 dimensions for historical data storage. The column should be of type datetime.
  • username and password: These credentials are stored as Secrets within the Qwak platform for enhanced security (refer to the screenshot below for more details).
  • host and the rest of the connection details: These parameters inform Qwak where to connect and which resource (table) to access. You can find all these details in your Snowflake account.

Secure Storage of Credentials

Qwak offers a secret service that allows you to securely store sensitive information like usernames and passwords. This ensures that your credentials are encrypted and managed securely.

Defining Entities

Entities are business objects that you want to make predictions about. In this example, we define a user entity.

To test the datasource, you can use the get_sample() method to retrieve sample data. This method will automatically test the connection to the Snowflake table, as well as validate the data by retrieving a sample with the first 10 rows.


The result should be: 

Registering the Data Source

Once you've verified the data sample, the next step is to register the Data Source and Entity in Qwak. This can be done effortlessly using Qwak's CLI as shown below:

qwak features register -p data_source.py

The  `-p` dictates the file where Qwak should look for Feature Store definitions.

The command output should be something like this.

Now you can not only test it locally, but can also see it in your Qwak dashboard and call it in your next FeatureSets.

2. Transforming Snowflake Data into Reusable Feature Vectors

In Qwak, feature sets are either SQL queries or Python functions designed to transform raw data into usable features. These feature sets can be scheduled for regular updates and can also be backfilled to generate historical features.

Defining the Feature Set

When defining a FeatureSet, consider the following components:

  • `name`: This identifies the FeatureSet when you're consuming features.
  • `entity`: This sets the unique key for each feature vector, which in this example is a registered user.
  • `data_source`: Specifies where to pull the raw data from. This should have been defined in the previous step.
  • `timestamp_column_name`: This is the column that Qwak uses to sift through historical data.

Scheduling and Backfilling

You can schedule a FeatureSet to run at regular intervals using cron scheduler syntax. For instance, in this example the `user-features` FeatureSet is set up to fetch new data every day at 8:00 AM.

The backfill option is used only when registering the feature set. It tells Qwak how far back in time to fetch historical data for the FeatureSet.

Finally, user_churn_features is a method that returns an SQL based transformation. This helps you filter, transform, and customize the FeatureSet's schema and data.

from datetime import datetime
# The @batch decorator that defines a FeatureSet characteristics
from qwak.feature_store.feature_sets import batch
# Qwak uses SQL based syntax for transformations
from qwak.feature_store.feature_sets.transformations import SqlTransformation

# Define the FeatureSet
  entity="user",                  # The Entity defined earlier
  data_sources=["users_table"],   # The SnowflakeSource defined earlier
@batch.scheduling(cron_expression="0 8 * * *")  # Run daily at 8:00
@batch.backfill(start_date=datetime(year=2023, month=1, day=1)) # Run for all data back to beginning of 2023
def user_churn_features():
  return SqlTransformation(sql=
      FROM users_table

Because we already registered the Entity and DataSource, we can now query a sample for this FeatureSet to validate it works as expected.


And the sample should look something like the following:

Registering the Feature Set

As with the DataSource, registering a FeatureSet is straightforward:

qwak features register -p feature_set.py

The  `-p` dictates the file where Qwak should look for Feature Store definitions.

Once you've set up the FeatureSet, you should see it reflected in the Qwak Dashboard. At this point, the data ingestion and processing pipeline should have already kicked off.

By registering the FeatureSet, Qwak stores the resulting data in two types of stores: an Offline Store and an Online Store.

  • Qwak Offline Store: This store utilizes a high-performance file format called Apache Iceberg, which is stored on top of an object storage and it's optimized for batch consumption.
  • Qwak Online Store: This store is built on in-memory cache DB, enabling low-latency feature retrieval. It's particularly useful for real-time predictions.

3. Consuming Features for ML Model Training

Most modern ML models are trained in batches, often referred to as offline training. In this section, we'll demonstrate how to consume features from Qwak's Offline Feature Store for model training.

from datetime import date

# Lightweight client to retrieve feature vectors from the Offline Store
from qwak.feature_store.offline.client import OfflineClient

# Define the features to be used for the model and fetched from the Offline Feature Store
# These are the specific features that the model will be trained on
key_to_features = {'user_id': [

# Define the date range for data retrieval
feature_range_start = date(2020, 1, 1)
feature_range_end = date.today()

offline_client = OfflineClient()

# Fetch data from the offline client
data = offline_client.get_feature_range_values(
  Entity_key_to_features = key_to_features,
  Start_date = feature_range_start,
  End_date = feature_range_end

# Validate with a feature vector sample

To retrieve features from the Offline Store, you'll use Qwak's OfflineClient. This requires a key-to-features mapping dictionary, along with start and end datetime values to specify the data fetching range.

The key_to_features mapping dictionary should follow this format, where the listed features are the ones used for model training or prediction:

{entity_key: [     feature_set_a.feature_1,

Running the code snippet above will return the following features sample:

4. Consuming Features for Real-Time Predictions

For real-time predictions, latency is a critical factor. In such cases, you should use Qwak's Online Store for feature retrieval.

import pandas as pd

# Qwak imports for the Online Store
from qwak.feature_store.online.client import OnlineClient
from qwak.model.schema import ModelSchema
from qwak.model.schema_entities import FeatureStoreInput, InferenceOutput

# Lightweight client to fetch data from the online store
online_client = OnlineClient()

# Expected feature vector schema for inference
model_schema = ModelSchema(

# Key to query
query_df = pd.DataFrame(columns =[ 'user_id' ],
                           data =[ '4583906c-fb69-4d03-9a97-626b78fe578c'])

# Fetch feature vectors for key
features = online_client.get_feature_values(model_schema, query_df)

# Print to stdout the feature vector DataFrame
print (features)  

The  OnlineClient serves as the query interface for Qwak's Online Store, offering fast feature retrieval.

To use the get_feature_values method, you'll need to specify two things:

  • ModelSchema: This is generally used to define what the ML model's inference endpoint should expect during a prediction request. Here, it's also used to inform the OnlineClient which features are needed for your model.
  • Query DataFrame: This is straightforward; in our example, it contains the user_id entity key to filter results.

The output should look something like this:


This section could address common issues that users might encounter and how to resolve them. For example:

FeatureSet Data Pipeline Fails

If your data ingestion pipeline fails, the first step is to consult the logs for clues about the failure. Navigate to the 'Feature Set Jobs' section in the Qwak Dashboard, as shown below.

Feature Retrieval fails for the Online or Offline Store

If you find that the Offline or Online client isn't retrieving any rows for a given key, you can verify the data in the Qwak UI under the 'FeatureSet Samples' section using an SQL query. For more detailed troubleshooting steps, refer to our documentation.

Note: When constructing your query, make sure to enclose column names in double quotes and prefix them with <feature-store.feature>, as shown in the example below.


In this comprehensive guide, we've walked you through the process of integrating Qwak's Feature Store with Snowflake to manage and serve machine learning features effectively. From setting up prerequisites to defining entities and feature sets, we've covered all the essential steps. We also delved into the specifics of consuming features for both batch and real-time machine learning models.

By now, you should have a solid understanding of how to leverage Qwak's Feature Store in conjunction with Snowflake's data warehousing capabilities. Whether you're looking to fetch features for offline batch training or need low-latency feature retrieval for real-time predictions, Qwak's dual storage system has you covered.

Thank you for reading, and we hope this guide empowers you to build more accurate and efficient machine learning models.

Learn more about how to build a fully blown ML application with your Snowflake data.

Related articles