MLOps

What is the MLOps stack?

In 2015, Google released its famous machine learning (ML) paper, Hidden Technical Debt in Machine Learning Systems.

Pavel Klushin

Head of Solution Architecture at Qwak

May 11, 2022

Contents

In 2015, Google machine learning researchers released its famous machine learning (ML) paper, Hidden Technical Debt in Machine Learning Systems. The paper describes many of the problems associated with developing, deploying, producing, and monitoring machine learning models and ML-driven systems.

The paper also controversially revealed that machine learning is no longer a discipline just for data scientists; it is also relevant for any software engineering practitioner who faces challenges when deploying models in production, scaling them, and implementing automated processes. In addition, the complexity of these machine learning systems lends itself to problems that were previously solved by adopting a DevOps mindset.

This is where the term MLOps came from.

MLOps principles

MLOps—which stands for machine learning operations and describes the discipline of machine learning plus IT operations—is a set of best practices that focus on making it easier to deploy machine learning models in production. MLOps defines language, framework, platform, and infrastructure practices to design, develop, and maintain these models.

Getting machine learning in production, however, involves many interconnected and interplaying components that ensure feature store in the machine learning models repository remains manageable. In addition, with the constantly growing number of MLOps platforms and frameworks, it is challenging to keep up with the pace of development.

According to recent research, as many as half of all organizations experience significant challenges with integrating their ML tooling, frameworks, and languages technology stacks. This, again, is because the field of machine learning and related technologies are in a state of near-constant change. In addition, the development of MLOps is also taking place alongside it at an equally rapid pace, which creates the additional challenge of having to adopt infrastructure that itself is also evolving.

The goal of MLOps is therefore to shorten the development lifecycle while improving model reliability and stability by automating repeatable steps in ML workflows.

The elements of an ML system

According to Google’s paper (and as seen in the below diagram, which has been taken from the paper), only a very small fraction of real-world machine learning systems are based on code. In practice, most of a machine learning system is based on supporting processes that include but are not limited to:

Data ingestion and data versioning
Data validation
Data pre-processing and feature generation
Model training and tuning
Model performance and fairness analysis
Model validation, versioning, and release management
Model deployment
Feedback and response

Graphical user interfaceDescription automatically generated

In addition to these processes, ML teams must also consider the ML lifecycle, which is often represented as a pipeline.

In an ideal world, this entire pipeline would be automated aside from model analysis and feedback, which require human intervention for review and confirmation. This is where MLOps and the MLOps stack come in.

The MLOps stack

The MLOps stack makes the ML lifecycle easier to manage by introducing tooling and other solutions for tasks such as data pre-processing, model development, model serving, model monitoring, and more. Individual tools and platforms used together to make up this “stack”.

To make it easier to decide which tools your machine learning team should use to adopt MLOps best practices, we have put together a template that breaks down the regular ML workflow into its individual components.

DiagramDescription automatically generated

This MLOps stack template is based on Google’s paper and best practices for MLOps and continuous delivery, and it has been somewhat simplified to make it more manageable. As you can see, there are nine components within the template, and ML tooling can be applied to each of them according to your specific requirements. Using this template, you can begin assigning specific tools and/or platforms to the nine components to build out your own MLOps stack.

As you will probably be aware, however, MLOps teams have plenty of choices when it comes to choosing which machine learning tools, platforms, and other technologies to use, which adds an element of complexity to what should, in theory, be a simple process. After all, there’s no shortage of available solutions and new ones come to the market each week.

When choosing which tools to use, however, it’s important to remember that the scope and focus of individual tools and platforms will vary. While some tools might focus on singular tasks such as data analysis and processing, other tools, especially larger platforms, will often cover several components, and in some cases even the entire end-to-end ML workflow—as is the case with Qwak.

No single MLOps stack works for every organization, though, and it’s important that machine learning teams consider their needs carefully. For instance, organizations that are operating inside of heavily regulated industries such as finance or healthcare might have more complex monitoring requirements than an organization operating in the eCommerce space.

Assessing MLOps tools

Due to the huge (and growing) number of existing ML tools and platforms, comparing them is not a trivial matter; you cannot simply look at them side by side. This is because, as we have already mentioned, these tools address different areas in machine learning engineering and deployment. Furthermore, they are developed to support solutions working on different scales, and they also vary in terms of underlying technology and potential for scalability. This makes it even more difficult to compare different tools in terms of their performance due to the unknowns in the research and development processes of individual tools, real-world application development, business requirements, and future demands.

That said, it is still possible to evaluate machine learning tools based on your own requirements. These include:

Flexibility — Can the tool be easily adopted in multiple situations, meeting the needs for different modeling techniques?

Framework Support — Are the most popular ML technologies and libraries integrated and supported by the tool?

Language Support — Does the tool support code written in multiple languages? Does it have packages for the most popular languages like R and Python?

Multiuser Support — Can the tool be used in a multi-user environment? Does this multiuser functionality raise potential security concerns?

Maturity — Is the tool mature enough for use in production? Is it still maintained and supported by the developer?

Community Support — Is the tool supported by any developer communities or backed by large organizations?

A successful ML lifecycle requires a strong MLOps stack

MLOps management must comprehensively cover the management and versioning of models, experiments, features, training, scaling, automated deployments, and more to support users.

Although ML teams may have once gotten by with manual processes, automation is now the name of the game and successful ML cycle management requires the adoption of machine learning tools to achieve success.

These are no longer optional extras and organizations will find that the pressure for adopting a sophisticated MLOps stack will only continue to increase alongside the proliferation of machine learning and the rapid growth of increasingly powerful tools.

To choose the right MLOps tools, it is important that ML teams understand their organization’s mission, long-term goals, its current data science environment, and the value that MLOps could deliver to it.

Build your stack with Qwak

Qwak is the full-service machine learning platform that enables teams to build, serve, and deploy their models, and transform them into well-engineered products. Our cloud-based platform removes the friction from ML development and deployment while enabling fast iterations, limitless scaling, and customizable infrastructure.

Want to find out more about how Qwak could help you deploy your ML models effectively and efficiently? Get in touch for your free demo!

MLOps

Bridging the Gap: How MLOps and DevOps Work Together for AI Adoption in 2025

Guy Eshet

December 8, 2024