Back to blog
Announcing Qwak Feature Store support for mongoDB data sources
Announcements

Announcing Qwak Feature Store support for mongoDB data sources

By 
Pavel Klushin
August 3, 2021

Qwak Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent. 


As we support different ways to ingest features into Qwak Feature Store, including Batch, streaming, or non-materialized features, we recently added support for MongoDB as a batch data source to pull data from.


The Data Source connectors provide you with a consistent data source interface for any database, and they create a standard way to combine stream and batch data sources for Feature Transformations.


How its done 


Defining a Feature Set enables you to create features from your analytical data: when calculating feature values, Qwak will simply read from the underlying data source.


For example, in a fraud detection model use case, we might have two values to pull from the MongoDB data source:

  • Average transaction per customer - avg_amount
  • Standard deviation of a transaction per customer - sttdev_amount


And two from Streaming events from Kafka:

  • Last transaction amount
  • Last transaction time


Architecture



How to configure MongoDB as Data Source


Snowflake data source connector definition:


from qwak.feature_store.sources.data_sources import MongoSource 
users_table = MongoSource(name='mongo_source',
                               description='a mongo source description',
                               date_created_column='insert_date_column', #the field of the insertion time of the records
                               hosts='',
                               username_secret_name='qwak-mongodb-user', #a key to obtain the actual username from Qwak secrets
                               password_secret_name='qwak-mongodb-pass', #a key to obtain the actual password from Qwak secrets
                               database='db_name',
                               collection='collection_name',
                               connection_params='authSource=admin')


Register batch feature using the MongoDB connector:


BatchFeatureSet(
  name=”batch_transaction_features”,
  data_sources=[“mongodb_transactions_history”],
  scheduling_policy=”daily”,
  validations=[expect_column_values_to_be_between(
    column=”amount”, min_value=0, max_value=None)]
  function=SqlFunction(
    “””
    SELECT User_ID,
      avg(Amount) AS avg_amount,
      sttdev(Amount) AS stddev_amount
    FROM mongodb_trasactions_history
    GROUP BY User_ID
    “””)
)

Qwak Feature Store helps ensure that models make accurate predictions by making the same features available for both training and for inference. 

For more information, or to contact us if you are missing a data source connector, feel free to reach out us at support@qwak.ai



Related articles