Target audience: Beginner
Estimated reading time: 4'
Estimated reading time: 4'
A few of my colleagues in data science are hesitant about embracing MLOps. Why should it matter to them? Actually a lot!
This article presents a comprehensive overview of MLOps, especially from a data scientist's perspective. Essentially, MLOps aims to address common issues of reliability and clarity that frequently arise during the development and deployment of machine learning models.
Table of contents
AI productization
MLOps encompasses a suite of tools that facilitate the lifecycle of data-centric AI. This includes training models, performing error analysis to pinpoint data types where the algorithm underperforms, expanding the dataset through data augmentation, resolving discrepancies in data label definitions, and leveraging production data for ongoing model enhancement.
MLOps aims to streamline and automate the training and validation of machine learning models, enhancing their quality and ensuring they meet business and regulatory standards. It merges the roles of data engineering, data science, and dev-ops into a cohesive and predictable process across the following domains:
- Deployment and automation
- Reproducibility of models and predictions
- Diagnostics
- Governance and regulatory compliance (Socs-2, HIPAA)
- Scalability and latency
- Collaboration
- Business use cases & metrics
- Monitoring and management
- Technical support
Predictable ML lifecycle
MLOps outlines the management of the entire machine learning lifecycle. This includes integrating model generation with software development processes (like Jira, Github), ensuring continuous testing and delivery, orchestrating and deploying models, as well as monitoring their health, diagnostics, performance governance, and aligning with business metrics. From a data science standpoint, MLOps involves a consistent and cyclical process of gathering and preprocessing data, training and assessing models, and deploying them in a production environment.
Data-centric AI
Andrew Ng pioneered the idea of data-centric AI, advocating for AI professionals to prioritize the quality of their training data rather than concentrating mainly on model or algorithm development. Unlike the conventional model-centric AI approach, where data is gathered with minimal focus on its quality to train and validate a model, data-centric AI emphasizes improving data quality. This approach enhances the likelihood of success for AI projects and machine learning models in practical applications.
MLOps, on the other hand, involves a continuous and iterative process encompassing data collection and pre-processing, model training and evaluation, and deployment in a production environment.
There are several difference between the traditional model-centric AI and data centric AI approaches.
Model Centric | Data Centric | |
Goal is to collect all the data you can and develop a model good enough to deal with noise to avoid overfitting. | Goal is to select a subset of the training data with the highest consistency and reliability so multiple models performs well. | |
Hold the data fixed and iteratively improve the model and code. | Hold the model and code fixes and iteratively improve the data. |
Repeatable processes
The objective is to implement established and reliable software development management techniques (such as Scrum, Kanban, etc.) and DevOps best practices in the training and validation of machine learning models. By operationalizing the training, tuning, and validation processes, the automation of data pipelines becomes more manageable and predictable.
The diagram below showcases how data acquisition, analysis, training, and validation tasks transition into operational data pipelines:
Fig 2. Productization in Model-centric AI
As shown in Figure 2, the deployment procedure in a model-centric AI framework offers limited scope for integrating model training and validation with fresh data.
Conversely, in a data-centric AI approach, Figure 3, the model is put into action early in the development cycle. This early deployment facilitates ongoing integration and updates to the model(s), utilizing feedback and newly acquired data.
AI lifecycle management tools
While the development tools traditionally used by software engineers are largely applicable to MLOps, there has been an introduction of specialized tools for the ML lifecycle in recent years. Several open-source tools have emerged in the past three years to facilitate the adoption and implementation of MLOps across engineering teams.
- DVC (Data Version Control) is tailored for version control in ML projects.
- Polyaxon offers lifecycle automation for data scientists within a collaborative workspace.
- MLFlow oversees the complete ML lifecycle, from experimentation to deployment, and features a model registry for managing different model versions.
- Kubeflow streamlines workflow automation and deployment in Kubernetes containers.
- Metaflow focuses on automating the pipeline and deployment processes.
Additionally, AutoML frameworks are increasingly popular for swift ML development, offering a user experience akin to GUI development.
Canary, frictionless release
A strong testing and deployment strategy is essential for the success of any AI initiative. Implementing a canary release smoothens the transition of a model from a development or staging environment to production. This method involves directing a percentage of user requests to a new version or a sandbox environment based on criteria set by the product manager (such as modality, customer type, metrics, etc.).
This strategy minimizes the risk of deployment failures since it eliminates the need for rollbacks. If issues arise, it's simply a matter of ceasing traffic to the new version.
Thank you for reading this article. For more information ...
References
- Overview of MLOps - KDnuggets
- Wikipedia MLOps
- Open source MLflow
- Venture Beat: AI productization challenges
- Forbes: Andrew Ng Launches A Campaign For Data Centric-AIT
- 3 ML challenges to overcome
- Machine Learning Engineering with MLflow: Manage the end-to-end machine learning life cycle with MLflow - Natu Lauchande - ISBN-13: 978-1800560796
- Introducing MLOps: How to Scale Machine Learning in the Enterprise - Mark Treveil, Nicolas Omont, Clément Stenac,, Kenji Lefevre, Du Phan, Joachim Zentici, Adrien Lavoillotte, Makoto Miyazaki, Lynn Heidmann - ISBN-13: 978-1492083290
- Beginning MLOps with MLflow - Sridhar Alla, Suman Kalyan Adari - ISBN-13: 978-1484265482
- Kubeflow for Machine Learning: From Lab to Production - Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko - ISBN-13: 978-1492050124
- Video: From Model-centric to Data-centric by Andrew Ng
- Video: What is MLOps - MLOps tutorial
- Video: What is MLOps - Getting started with ML Engineering
- Video: Machine Learning Engineering for Production (MLOps
- MIT - AI Prediction Problem
---------------------------
Patrick Nicolas has over 25 years of experience in software and data engineering, architecture design and end-to-end deployment and support with extensive knowledge in machine learning.
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3
He has been director of data engineering at Aideo Technologies since 2017 and he is the author of "Scala for Machine Learning" Packt Publishing ISBN 978-1-78712-238-3