Overcoming the Challenges of Deploying Machine Learning

Are you getting different results from your machine learning (ML) algorithms? Perhaps your results aren’t what you expected, or perhaps the model is making different predictions each time that it’s trained, even when it’s being trained on the same data set each time.
Pavel Klushin
Pavel Klushin
Head of Solution Architecture at Qwak
September 30, 2022
Overcoming the Challenges of Deploying Machine Learning

Are you getting different results from your machine learning (ML) algorithms? Perhaps your results aren’t what you expected, or perhaps the model is making different predictions each time that it’s trained, even when it’s being trained on the same data set each time. This is something that’s to be expected to an extent, and in some cases, it might even be a feature of the algorithm rather than a bug. 

The truth is that while algorithms have become a lot faster and more sophisticated in recent years, they’ve also become a lot more complex, and it’s not always easy to understand why an algorithm is behaving in a particular way. This becomes even more difficult over time because system complexity tends to compound as time passes, especially when ML is deployed at scale. 

Indeed, it’s ML solutions that are deployed at scale that represent one of the most pressing challenges faced by organizations today. The ML workflow, which includes training, building, and deployment is an invariably long process with many challenges along the way. As such, there’s no shortage of data science projects that never make it to production because of challenges that stifle progress. 

The main challenges of productionizing ML models

To overcome the challenges of deploying ML models, we first need to identify what causes them. Some of the top challenges that organizations face when trying to take a machine learning model into production are:

The need for heterogeneity

End-to-end ML solutions are typically made up of components that have been written in different programming languages. Depending on the model’s use case, a data scientist or ML engineer might choose Python, Scala, R, or a variety of other languages to build a single model, then use another language for a second or third model. As even the most basic of developers will know, there are numerous toolkits and frameworks available within a given language—for example, Python has TensorFlow, PyTorch, and Scikit just to name a few—that are tuned for different operations, and each toolkit or framework outputs a different type of model. 

Although the variety of frameworks and toolkits available enables developers to choose a language and tool that best meets the problem that’s being solved, such heterogeneity within codebases makes it difficult to keep consistency. Although containerization technologies such as Docker can solve incompatibility and portability challenges introduced by heterogeneity, things like automatic dependency checking, error checking, testing, and build tools will not be able to tackle problems across the language barrier.

ML model deployments are not self-contained

Machine learning mode deployments aren’t self-contained. They are typically either embedded or integrated into business-critical applications, which can make ongoing maintenance and development very challenging. 

By far, deploying a model by wrapping it as a REST API is the simplest solution for integration with existing business applications. This method aligns well with microservice architectures and enables teams to individually update or scale specific model components. Creating a REST API is easy as the framework provides the bulk of required capability out of the box, however, some models need to be deployed as gRPC APIs for efficient network usage and better performance, especially when there are large inputs. 

Aside from REST APIs for integrating disparate components, another widely used approach is the use of messaging systems. An ability to package and deploy a model that can integrate with messaging systems can make deployments much more seamless.

Defining what an ‘ML model’ is

How an organization defines what a machine learning model is can have a huge impact on the ease of deployment. So, what is it? Is it only the model parameters that are obtained after training or does it also need to include feature transformations thatare important for the model to work correctly?

There are many libraries that combine feature transformations and the actual ML model in a single abstraction often referred to as ML pipelines. From the perspective of a system, the model can be considered to be a so-called ‘black box’ with defined inputs and outputs, or it could be considered as a combination of operations where specifics and semantics are known. A model can also be a combination of models (for example, ensembles where models from different languages or libraries are combined, or where one model’s output is another model’s input).

The proliferation of service-orientated architecture and microservices have moved applications from huge stacks of complex code to much more composable, manageable, and digestible components. As time goes by, ML is becoming even more composable as its building blocks are now more granular and disparate. Nowadays, most models are deployed and managed as either a single unit or multiple components that are managed and updated in isolation. 

Testing and validation

Models continuously evolve as data changes. Each time change happens, model performance must be re-validated, and this introduces several challenges:

  • Models must be evaluated using the same test and validation datasets to be able to compare the performance of different models.
  • The same code for evaluating metrics must be used across different models to guarantee comparability.
  • Updates to test/validation datasets or code require the different ML models (including old and new) to be re-evaluated in order to be comparable.
  • Model improvements may come at significant costs, such as longer prediction times. Benchmark tests must be part of the validation process to identify these impacts. 

In addition to the validation of models in offline tests, assessing the performance of models in production is critical. 


It has become very common for ML teams to use CI/CD (Continuous Integration and Continuous Development) tools. CI/CD tools help ML teams to push accurate updates to production quickly and unlock a wide range of other benefits including version control, better security, and better reliability and reproducibility. 

Most CI/CD tools support the well-known software development workflows which include build, test, and deploy steps. However, ML workflows have unique characteristics that are not observed in traditional development workflows.

The biggest difference between traditional applications and ML is the fact that ML’s primary input isn’t just code. There’s another crucial input: data. Versioning must be applied to both of these inputs to achieve reproducibility and auditability. It’s important to also monitor both the data and the code for any changes and then automatically trigger workflows. 

In addition, while hardware and software dependencies for traditional app development are usually homogenous, each stage of the workflow may demand specific hardware and software components when it comes to ML model development. This is particularly true during model training which is typically long and intensive and requires hardware accelerators such as GPUs. CI/CD tooling used for ML model development should therefore be capable of provisioning such dependencies as and when they’re required. 

Streamlining ML deployment through robust tooling

Implementing machine learning in a production environment is about far more than just deploying a model for prediction and praying that everything works. It’s a big job, and although deploying ML into production can throw up a whole load of roadblocks, using the right tooling can help ML teams get over these and many other ML deployment challenges. 

One of the most important steps is to set up a CI/CD ML pipeline that enables teams to automatically build, test, and deploy new ML pipeline implementations and iterate quickly based on changes in data and business environments. You can begin by gradually implementing CI/CD best practices in your ML model training and pipelines as a part of your MLOps processes to reap the rewards of automating your ML system development and operationalization. And you can do this with our own platform, Qwak.

Qwak is the full-service machine learning platform that enables teams to take their models and transform them into well-engineered products. Our cloud-based platform removes the friction from ML development and deployment while enabling fast iterations, limitless scaling, and customizable infrastructure.

‍Want to find out more about how Qwak could help you deploy your ML models effectively? Get in touch for your free demo!

Chat with us to see the platform live and discover how we can help simplify your journey deploying AI in production.

say goodbe to complex mlops with Qwak