188x Filetype PDF File size 0.86 MB Source: www.cognizant.com
Cognizant 20-20 Insights Digital Business Accelerating Machine Learning as a Service with Automated Feature Engineering Building scalable machine learning as a service, or MLaaS, is critical to enterprise success. Key to translate machine learning project success into program success is to solve the evolving convoluted data engineering challenge, using local and global data. Enabling sharing of data features across a multitude of models within and across various line of business is pivotal to program success. Executive Summary 1 The success of machine-learning (ML) algorithms nonexperts. Most enterprises began their ML journey in a broad range of areas has led to ever-increasing with projects of simpler analytical complexity because demand for its wider and complex application, they were primarily focused on the maturity of their data proliferation of new automated ML platforms/solutions infrastructure, ML model development process and and increasingly flexible use of these techniques by deployment ecosystem. October 2019 Cognizant 20-20 Insights 2,3,4 According to a recent O’Reilly published study Creating a feature store, a central repository of roughly 50% of enterprise respondents said they features (basically any input into an ML model) were in the early stages of exploring ML, whereas in a store with a marketplace construct, enables the rest had moderate or extensive experience of producers like ML engineers (creating and deploying ML models into production. populating new features) to share them with consumers like data scientists (building ML Enterprises, irrespective of their maturity, are models). This will reduce GTM substantially, currently focused on managing data pipelines along with enabling data lineage and bringing and evaluating/developing ML platforms. But governance into the data pipeline labyrinth. For as they ascend the maturity curve, they need to enterprises to mature in ML, a focus on setting up a solve the problem of the ML model-related data feature store will be as essential as the adoption of pipeline labyrinth as creation and management auto ML frameworks, model monitoring and model of these elements are labor-intensive, which over visualization — which was also the outcome noted time introduces data complexities and related by the recent O’Reilly survey. operational risks. This white paper offers insights into why enterprises ML is core to the success of digitally native need a fully functional feature store in their ML businesses such as Uber and LinkedIn for creating maturity journey and how this can be achieved new products and redefining customer experience using an operating model that can accelerate standards at a global scale. There are certain ML scale goals through automation, making ML aspects of ML architecture that can be deftly learning algorithm features reusable, cost-effective adopted by digital immigrant enterprises as they and tangible. This is critical because our approach seek to mature their use of artificial intelligence (AI). automates one of the most laborious activities in the model lifecycle — feature engineering. 2 / Accelerating Machine Learning as a Service with Automated Feature Engineering Cognizant 20-20 Insights The need for a centralized feature engineering ecosystem 5 ML is a powerful toolkit that enables businesses The process of building and deploying an ML to strive for excellence, whether it’s new product model goes beyond setting up a requisite development or achieving operational efficiencies. infrastructure. ML projects have a typical timeline However, ML initiatives entail the development of two to four months for idea validation and of complex systems that behave differently than prototype development, which often gets extended traditional IT systems. by several more months if prototypes are pushed into production. The cycle is repeated for each In fact, ML systems contain inherent risks (e.g., model rebuild iteration or new model development. complex data pipelines, unexplainable code) which, unless addressed properly, lead to high Figure 2 (page 4) illustrates an ML project, maintenance costs over the long run. The depicting various stages and related efforts. development of ML code is generally seen as labor- Processes with relatively less effort have been intensive and complex, whereas other essential addressed by the deployment of ML platforms activities surrounding it are seen as less critical — like Sagemaker, but key labor-intensive processes which is incorrect. Rather, data (functions such as around data acquisition and processing are quality, features, etc.) and resource management still repeated in each iteration of the model are equally important for building a successful ML development exercise. infrastructure (see Figure 1). A day in a life of a data scientist (DS) consists of deriving insights, knowledge and model ML heat map depicting processes and related efforts6 Data Verification Machine Monitoring Resource Management Data Collection Configuration ML Code Analysis Tools Serving Infrastructure eature traction Process Management Tools Figure 1 3 / Accelerating Machine Learning as a Service with Automated Feature Engineering Cognizant 20-20 Insights Illustrative model lifecycle 2–4 weeks 2–6 weeks 4–6 weeks 1–2 weeks 1 week Development Development Development Model Model Environment Data Acquisition Data Feature Development Deploy-Ready Setup Engineering Model Rebuild Model Monitoring Model Serving Production Feature Production Data Production Engineering Acquisition Environment Setup 1–2 weeks 1 month 2–3 months 1–2 months 1–3 months Figure 2 development from data. (For more on this, read “Learning from the Day in the Life of a Data Working solo Scientist” in our Digitally Cognizant blog). This requires data cleansing, transformation and feature extraction before building a stitch of ML code. The DATA SCIENTIST process starts with data extraction in a modeling sandbox, on to hypothesis validation, followed by deployment of code that requires designing a Focused on code generation fully fledged data pipeline. The activities happen without much collaboration with architects and engineers. primarily in isolation, which is typical of an experimentation phase. Upon successful exploration, other key role ML ARCHITECT players — like ML engineers and an ML architect — must come up to speed and plan necessary Wondering what data/IT support activities, which results in a longer architecture changes are needed to development lifecycle (see Figure 3). support code. During model development, the data scientist will build common features and features that are ML ENGINEER specific to the model. Industry standard practice is to create extract, transform, load (ETL) pipelines for common features while generally bundling model- Wondering what data pipeline specific features within the model itself — which reengineering are needed to leads to the following situations: support the codes. Figure 3 4 / Accelerating Machine Learning as a Service with Automated Feature Engineering
no reviews yet
Please Login to review.