MLOps

A Brief Comparison of Kubeflow vs. Metaflow

Kubeflow, created by Google in 2018, and Metaflow, created by Netflix in 2019, are powerful machine learning operations (MLOps) platforms that can be used for experimentation, development, and production.

Ran Romano

Co-founder & CPO at Qwak

November 7, 2022

Contents

A Brief Comparison of Kubeflow vs. Metaflow

Indeed, there’s no shortage of similar MLOps tools available on the market right now that all promise to do one thing: make the lives of ML teams easier. As a result of the explosion in growth of the number of available MLOps tools, it can be challenging to decide which ones to use and understand how they interact with one another.

Some tools like Google’s Kubeflow have been built specifically for MLOps while others are designed for more general-purpose applications and are not built specifically for ML workflows, such as Argo.

In a series of new guides, we’re comparing the Kubeflow toolkit with a range of others, looking at their similarities and differences. This time, we’re looking at Kubeflow vs Metaflow.

Kubeflow vs Metaflow

Kubeflow is a Kubernetes-based end-to-end machine learning (ML) stack orchestration toolkit for deploying, scaling, and managing large-scale systems. The Kubeflow project is dedicated to making ML on Kubernetes easy, portable and scalable by providing a straightforward way for spinning up the best possible OSS solutions.

Metaflow is a ‘human-friendly’ Python library that helps scientists and ML engineers build and manage data science projects. It was originally developed by Netflix to boost the productivity of data science teams who work on a variety of different projects.

In this comparison, we’re going to look at the main differentiators that will help you decide between Kubeflow vs Argo. We’re also going to cover some of the similarities that exist between the two.

What is Kubeflow?

Kubeflow is a free and open-source ML platform that allows you to use ML pipelines to orchestrate complicated workflows running on Kubernetes.

It’s based on the Kubernetes open-source ML toolkit and works by converting stages in your data science process into Kubernetes ‘jobs’, providing your ML libraries, frameworks, pipelines, and notebooks with a cloud-native interface.

Kubeflow works on Kubernetes clusters, either locally or in the cloud, which enables ML models to be trained on several computers at once. This reduces the time it takes to train a model.

Some of the features and components of Kubeflow include:

Kubeflow pipelines — Kubeflow empowers teams to build and deploy portable, scalable ML workflows based on Docker containers. It includes a UI to manage jobs, an engine for scheduling multi-step ML workflows, an SDK to define and manipulate pipelines, and notebooks to interact with the system.

KFServing — This enables serverless inferencing on Kubernetes and provides performant and high abstraction interfaces for ML frameworks such as PyTorch, TensorFlow, and XGBoost.

Notebooks — Kubeflow deployment provides services for managing and spawning Jupyter notebooks. Each Kubeflow deployment can include several; notebook servers and each notebook server can include multiple notebooks.

Training operators — This enables teams to train ML models through operators. For example, it provides TensorFlow training that runs TensorFlow model training on Kubernetes for model training.

Multi-model serving — KFServing is designed to serve several models at once. With an increase in the number of queries, this can quickly use up available cluster resources.

What is Metaflow?

Metaflow, as we’ve already discussed, is a Python library that enables teams to build production ML. Netflix initially developed it to improve the productivity of data scientists who build and maintain different types of machine learning models. However, it is a much more focused tool and the major concepts within it revolve around pipelines and orchestration.

Metaflow is made up of five major components. These are:

Flow — The smallest unit of computation that can be scheduled for execution. A flow defines a workflow that pulls data from an external source as input, processes it, and outputs data.

Graph — Metaflow uses transitions between step functions to deduce a directed acyclic graph (DAG0. These transitions ensure that the graph is parsed statically from the source code of the flow.

Step — A step can be defined as a checkpoint that provides a fault tolerance for the system. Metaflow typically takes a snapshot of the data produced by a step and uses it as input to the subsequent steps. If a step was to fail, it can be resumed without rerunning the preceding steps.

Runtime — The runtime (or scheduler) executes and orchestrates tasks defined by steps in topological order. The metaflow.client, a Python API, can be used to access the results of runs.

Datastore — This is an object store where data artifacts and code snapshots can be persisted. It’s accessible by all environments where the Metaflow code is executed.

Kubeflow vs Metaflow similarities

Kubeflow and Metaflow are both tools that operate in the MLOps space. Some similarities that exist between Kubeflow vs Metaflow include:

Both platforms are open source and can be used by anyone, anywhere.

Both platforms use Python; tasks can be defined using Python in Kubeflow whereas Metaflow is built as a Python library.

Both platforms can be used for orchestration, and both offer support for pipelines running in parallel.

Both platforms have a UI. In Kubeflow, this is the central dashboard whereas in Metaflow it’s a separate add-on service.

Kubeflow vs Metaflow differences

While Kubeflow attempts to capture the entire ML development process with hosted notebooks, serving, and other functionality on top of pipeline automation, Metaflow is more focused on orchestrated pipelines. Here are some ways that the two tools differ:

Their scopes

There’s a huge difference in scope between the two tools. Which one is the right for your team will therefore depend on any tools you’ve already adopted.

If you haven’t adopted any tooling yet, Kubeflow is likely to be a useful solution whereas if you’re looking for a tool to handle production pipelines only, Metaflow would be the better option.

Their approaches to pipelines

Both tools have different approaches to pipelines. Metaflow pipelines are Python methods passing data to one another, and this makes them relatively easy to build. There is a drawback though, and that is that Metaflow can be tricky to work with when you have unconventional datasets.

In Kubeflow, however, steps run in separate containers and communicate via files. This is a much more versatile approach because steps can have their own dependencies, and any kind of data is easy to transfer in files. Kubeflow doesn’t support Python communication between components, which may be a limiting factor.

The approach that works best for your ML teams will depend entirely on your use case and preferences.

Distributed computation

Kubeflow architecture provides Kubernetes ‘under the hood’ This helps to solve problems such as cloud deployment and migration because Kubernetes is open-source and can be installed on any cloud.

In addition, DevOps and MLOps teams are often more familiar with Kubernetes, and there’s a wider range of third-party tools available for Kubernetes cluster monitoring.

Cloud

Metaflow locks in Amazon Web Services whereas Kubeflow has unlimited cloud deployment, such as GCP, Azure, or anything that runs on Kubernetes.

Kubeflow vs Metaflow summary

By deploying and utilizing machine learning with Kubeflow or Metaflow, teams can unlock support for the most common ML scenarios, such as managing code, data, and dependencies for experiments.

In this article, we’ve highlighted and compared some of the critical similarities and differences between Kubeflow and Metaflow to help you decide between the two very different but very powerful platforms.

If you’ve got a larger team that would benefit from a unified workspace where the entire team can experiment with and deploy machine learning models into production, Kubeflow is likely to be your best choice.

That said, if you’re more interested in building production pipeline but you’ve already got tooling in place for most things, Metaflow is going to be your best choice. The power of Metaflow is the fact that its approach is opinionated, but at the same time this can mean that it might not fit every use case. When it does, however, it’s a powerful tool that’s very easy to work with — and remember, it doesn’t require Kubernetes.

Qwak as an alternative to Kubeflow and Metaflow

Instead of using either of these, though, why not use a tool like Qwak?

Qwak is a robust MLOps platform that provides a similar feature set to Kubeflow in a managed service environment that enables you to skip the maintenance and setup requirements.

Our full-service ML platform enables teams to take their models and transform them into well-engineered products. Our cloud-based platform removes the friction from ML development and deployment while enabling fast iterations, limitless scaling, and customizable infrastructure.