What is MLOps and who is it for?

Chris Lee
Nov 11, 2021

Key takeaways:

MLOps is an approach derived from software development practices to make sure ML models are built, deployed and maintained in a controllable, scalable way.
Most plant operations teams aren’t staffed with software development-minded engineers. This makes traditional MLOps a barrier rather than an enabler.
A packaged application-based approach might be a better way to achieve the benefits of MLOps in many cases.

MLOps (Machine Learning Operations) is a set of tools and practices which help ensure that the development and release of ML models is orderly, repeatable and scalable. This is a gross oversimplification but, roughly speaking, it ensures that:

Developers have the right ML tools and environments to build with
The right models are put into the right places at the right times
Performance of the models (outcomes, regulatory and policy) is understood over time
Updates to models can be created and deployed effectively

Figure 1 – A high-level picture of the components of MLOps

All of these are simple, intuitive ideas – there is nothing magical about any of the components. The challenge comes in making these things happen at scale. That is, if you have a few models, manually tracking and deploying them isn’t so hard. However, when you have dozens or hundreds of models, all being created and maintained by different people, this starts to become difficult. Fortunately, there is precedent for solving this problem: DevOps (a portmanteau of software Development and IT Operations). DevOps is a formalization of the practices that software engineers use to write and deploy software. In order to maximize learning cycles, development teams often strive for continuous integration and continuous deployment (CI/CD) – that is, pushing code changes into the field as soon as they are made and tested. This is done by programatically linking commits from the code repository with testing, building, and deployment tools so that the software gets to all the locations that need it in the correct form for each location. The exact tools and processes used vary from company to company but, suffice it to say, it relies on a pipeline of many software tools configured to work together – each one doing a specific task within the larger DevOps flow. If that sounds natural to you, then you’re probably someone who would appreciate MLOps. If this sounds like a mysterious pile of jargon then hang in there – there is hope.

There are, broadly speaking, two ways to think about ML models: 1) as a type of software development and 2) as part of a packaged application.

In the first, developer-centric case, you get a team of developers (data scientists) to write the code (models) and release it using a standard DevOps (MLOps) flow. This works great when you have people who are familiar with setting up, running and working within a highly code-driven environment. Many large organizations can afford to set up and maintain these groups. Despite Microsoft CEO’s assertion that “Every business will become a software business…”, I think it safe to say that most organizations have neither the resources nor expertise required to set up and run a system like this. It would be a significant stretch to ask your senior metallurgists, reliability engineers or even your typical plant IT engineer to spec out, provision, configure, operate and maintain an MLOps pipeline – even if it resides in the cloud. In the more common case of limited software development expertise and resources, ML deployment cannot follow a full DevOps-inspired model – that would be ideal but it’s not practical. Instead, a more black-box, packaged approach is required. That is, the goals that MLOps sets out to facilitate are still important but they have to be met with a form factor that lets non-software engineering types do the work.

Figure 2 – Moving MLOps inside the black box of an application

In the second, application-centric case, details of the MLOps work are hidden from the end user. This comes with tradeoffs in terms of flexibility. For example, the application may not support a particular, cutting-edge ML algorithm or the versioning system may not have the structure recommended by a consultant. However, in the spirit of not letting the perfect be the enemy of the good, I think it is reasonable to say that 1000 configuration options are too many when even 10 options cannot be effectively managed.

What does putting MLOps into practice mean then? In the first case, it means a DevOps inspired suite that is operated by a dedicated team of specialists. It gives the most flexibility and most capability at the cost of expertise required to run the MLOps system – not a small thing but the most powerful thing for those who can manage it. In the second case, it means running an application that contains the most important elements of MLOps (development tools, deployment tools, performance monitoring, version management and updating tools). This ensures that the operations team you already have can create, deploy and manage models at scale even if they don’t have software development experience or a similarly skilled team backing them up. Such an application will have less functionality than a full MLOps stack can provide but, for most organizations, this is the level that their business can support so it’s the right level to aim for.

Falkonry Clue is such a tool. It was developed with non-data science, non-software engineers in mind. The tasks required to build, deploy and maintain models are hidden from the end users. Instead, users only need to respond to periodic queries for episode resolution and the rest of the MLOps tasks are handled in the background. Contact us for a demo.