Back to blog
Machine Learning Build system: why do you need it?

Machine Learning Build system: why do you need it?

Yuval Fernbach
December 15, 2021

Creating production-grade ML solutions isn’t an easy task. Data scientists can implement and train ML models with predictive performance on an offline holdout dataset given relevant training data for their use case. However, the real challenge isn't building an ML model; the challenge is building an integrated ML system and continuously operating it in production.

Applying DevOps principles to ML solutions isn’t trivial. This blog post covers the main differences between DevOps and MLOps, as well as the main challenges that MLOps build systems should solve. Later in this series, we’ll discuss how to solve those challenges in various ways.

How is DevOps different from MLOps?

DevOps is a popular practice that became common in software engineering. It’s a proven technique, and nowadays companies apply DevOps methods from their first developed software. DevOps allows companies to shorten their development cycle, as well as make releases faster, more robust, and controlled. 

However, ML is different in its nature; unlike “traditional software,”  ML Changes over time, its data changes over time, and DevOps methods aren’t enough. Machine learning is highly experimental; we play around with different algorithms, datasets and parameters to get out the most from our models. ML engineers have to experiment a lot while maintaining code reproducibility and reusability.

We all know that ML is about data. It might take a lot of time for a data scientist to come up with an approach to features engineering or model development – things that traditional software engineers never do. The focus is different in ML, and that’s another reason why DevOps methods aren’t sufficient. Although current DevOps solutions fail for ML, we should still learn from the DevOps principles and build for ML. The main principle in a DevOps continuous integration is the Build System.

What’s a Build System?

According to Wikipedia’s Software build article, “a build is the process of converting source code files into standalone software artifact(s) that can be run on a computer, or the result of doing so.”

A software build’s main functions are:

  • Version control
  • Code quality
  • Code reproducibility
  • Compilation

A build system allows engineers to test their code and eventually create a compiled version, an artifact of their program. Once they’ve tested and built the artifact, engineers are ready to initiate the deployment phase and deploy the new version to production. The build system helps engineers trust that their code is production worthy. In any future issue or future development, they’ll be able to go back to the build system, update their source code, and issue a new version of their artifact.

A Machine Learning Build System

Machine learning processes are different from traditional software development. Machine learning still lacks the standards that are available for software development. To create those processes, one will need to focus on few main functions:

Model Training

A new build will usually be issued because of two reasons:

  1. Data change — Model data drift over time; models are trained from data, but the environment keeps changing, the data keeps changing, and eventually that drift leads to model performance degradation.
  2. Model/Code change — Experiments never end, new models are created, new methods are available.

Issuing a new version will usually result in training a new model version. Connecting the model training to a build process allows us to connect between the model code, model artifact (trained model), and model data. This leads us to the next function.

Model Reproducibility

Unlike in software development, where it’s enough to recompile the source code to reproduce the same software artifact, a model is built from multiple configurations:

  1. Training data
  2. Model parameters
  3. Model code

A build system has to track those configurations and allow the data scientist to reproduce the same model if needed. The build version must include the model training dataset, the model different hyper-parameters and the model source code. It will also preferably track the differences between different model versions.

Version Control

Reproducibility isn’t enough’ in many cases, models aren’t trained once. Understanding the difference between model versions, rolling back to a specific version of a model, or promoting versions with different deployment strategies are the key requirements of deployment solutions. It’s the build system’s responsibility to support those needs.


Data scientists are familiar with the concept of testing, but usually to a single focus — the model quality/performance.  However, when one builds a production system, model performance isn’t enough. One of the key principles of software build systems is testing, and many companies take that to the next level with principles like test-driven development (TDD), but in ML, usually those concepts don't apply.

Running unit tests and integration tests to ML models allows data scientists to trust their model artifacts, and it allows the ML engineers and DevOps to trust that this artifact is safe, robust, and eventually fit to production.

Model serialization

After a model is trained, it is desirable to have a way to persist to model artifacts for a future use without having to retrain the model. One of the key concepts of build systems is immutability, an object whose state cannot be modified after it is created. And in the case of ML models, once a model is trained, it should not be modified.

There are a few ways to put trained machine learning (ML) models into production. The most common method is to serialize the model using some particular format after training, and then deserialize that model in the production environment. The build system should support different ML libraries serialization methods because, usually, different algorithms and ML libraries are used across the companies ML solutions.


As mentioned above, in many cases, models will evolve; both data and model changes initiate continuous model builds. The build system needs to support integration with orchestrators and expose indications regarding the build process.


Implementing ML in a production environment doesn't mean only training a model and running inference for prediction. Rather, it means deploying a system that can automate the model creation. The system should allow the company to automatically train, test, and create new production-grade model artifacts. This system lets you cope with rapid changes in your data and business environment. Companies are already building their DevOps principles as a first step for software development; the case shouldn’t be different for ML development.

Related articles