Real-time features are features that are being calculated on the request time instead of being calculated in advance from a defined data source, such as BigQuery, Kafka, etc.
As such, the raw data of the requested feature should arrive in the request itself, or be accessed directly from the feature set definition (code) - for example, making calls to external APIs or ad-hoc data fetching from a database.
In addition, real-time features have an optional staleness parameter defined by the user, which enables making sure that the features are always up-to-date; even if they were not calculated in advance.
Real-time features are sent to the model and automatically write the data to the feature store for future requests.
Real-time feature sets’ main use-case is for when the data manipulation needs to be a part of the model inference, but the data returned from the Feature Store is non-existent, stale or needs to be enriched, while in inference time you do not want to (or cannot) pre-calculate it, but you do want to allow reusability and manage these transformations in a single location.
Real-time calculation - you are part of a fintech company and need to convert all the different currencies to their USD equivalent before sending them to the model. The transaction amount is taken from the transaction itself and you want to make sure that all the models in your organization are doing the exact same currency conversion.
External API - you are working in an insurance company and every time a new customer onboards to your platform you are invoking an external API in order to do a background check. Similar to the previous use-case, your goal is to create a standard of accessing the external API, whilst minimizing the amount of calls. If you already received a request for a specific customer in the last 24 hours, the real-time feature in Qwak allows you to do that.
/ code example - Plaid API (Real-time) with a Snowflake batch connection
When it comes to obtaining training data for real-time features, the current approach involves incorporating real-time functionality as part of a batch feature set. This means that the training process is conducted using the offline store, rather than directly from real-time data.
In terms of the repercussions of not providing the necessary data for a real-time calculation, it is important to clarify the context. If you mean that when invoking the real-time function, no data is returned, then null values will be provided. This occurs when the requested key is either outdated or does not exist. In such cases, the real-time function will be called, but if it doesn't contain relevant data, null values will be returned instead.
Regarding the visibility of feature lineage in real-time, since it is implemented as a batch feature set with an added real-time function, feature lineage can indeed be observed and traced back to its source.
Qwak has transformed the MLOps lifecycle, enabling practitioners to scale their models into production faster than ever. The end-to-end platform reliably handles data transformation, storage, pipelines, build and deploy, and encompasses a next generation of MLOps tooling for businesses.