Содержание

Этот файл содержит неоднозначные символы Юникода, которые могут быть перепутаны с другими в текущей локали. Если это намеренно, можете спокойно проигнорировать это предупреждение. Используйте кнопку Экранировать, чтобы подсветить эти символы.

Copyright (c) Microsoft Corporation
Licensed under the MIT License

Introduction

Traditional machine learning model development is resource-intensive, requiring significant domain/statistical knowledge and time to produce and compare dozens of models. With automated machine learning, the time it takes to get production-ready ML models with great ease and efficiency highly accelerates. However, the Automated Machine Learning does not yet provide much in terms of data preparation and feature engineering. The Auto Brew ML framework tries to solve this problem at scale as well as simplifies the overall process for the user. It leverages the Azure Automated ML coupled with components like Data Profiler, Data Sampler, Data Cleanser, Anomaly Detector which ensures quality data as a critical pre-step for building the ML model. This is powered with Telemetry, DevOps and Power BI integration, thus providing the users with a one-stop shop solution to productionize any ML model. The framework aims at ‘Democratizing’ AI all the while maintaining the vision of ‘Responsible’ AI.

WiKi

Getting Started

Prerequisites

Azure Databricks
Auto Brew ML Notebooks (Master, Trigger notebooks)
Azure ML Services workspace
Python cluster in Databricks with configurations as mentioned in Installations link above (PyPi library azureml-sdk[automl],azureml-opendatasets, azureml-widgets in cluster)
For sample to be used in notebook- Real Estate Data

Using the Notebooks

AMLMasterNotebook: Contains all the base functions used Data Acquisition, EDA, Sampling, Cleansing, Anomaly Detection, Azure AutoML Trigger, AutoML Trigger bypassing authentication to Azure ML(used for pipelining the notebook).
AMLMasterNotebook- Trigger: Function calls in order to perform a pipeline of tasks.

Framework Components

Exploratory Data Analysis
Data Sampling
1. Random Sampling
2. Stratified Sampling
3. Systematic Sampling
4. Cluster Sampling (with SMOTE)
Data Cleansing
Anomaly Detection
Feature Selection
Azure Auto ML Trigger (*Azure Component encapsulated with all cofigs and parameters)
Responsible AI Guidelines
1. Error Analysis
2. Model Interpretation and Exploration
3. Fairlearn to detect Fairness of the data and model
4. Identify & Remove Biasness in data
5. SmartNoise to maintain PII data secrecy
Telemetry & DevOps Integration for Pipelining
Sample Runs