MLOpsPython/README.md

### Author: | Praneet Singh Solanki | Richin Jain |

# DevOps for AI

[![Build Status](https://dev.azure.com/customai/DevopsForAI-AML/_apis/build/status/Microsoft.DevOpsForAI?branchName=master)](https://dev.azure.com/customai/DevopsForAI-AML/_build/latest?definitionId=1&branchName=master)


DevOps for AI will help you to understand how to build the Continuous Integration and Continuous Delivery pipeline for a ML/AI project. We will be using the Azure DevOps Project for build and release/deployment pipelines along with Azure ML services for model retraining pipeline, model management and operationalization. 

This template contains code and pipeline definition for a machine learning project demonstrating how to automate the end to end ML/AI project. The build pipelines include DevOps tasks for data sanity test, unit test, model training on different compute targets, model version management, model evaluation/model selection, model deployment as realtime web service, staged deployment to QA/prod and integration testing.


## Prerequisite
- Active Azure subscription
- At least contributor access to Azure subscription

## Getting Started:

To deploy this solution in your subscription, follow the manual instructions in the [getting started](docs/getting_started.md) doc


## Architecture Diagram

This reference architecture shows how to implement continuous integration (CI), continuous delivery (CD), and retraining pipeline for an AI application using Azure DevOps and Azure Machine Learning. The solution is built on the scikit-learn diabetes dataset but can be easily adapted for any AI scenario and other popular build systems such as Jenkins and Travis. 

![Architecture](/docs/images/Architecture_DevOps_AI.png)


## Architecture Flow

1. Data Scientist writes/updates the code and push it to git repo. This triggers the Azure DevOps build pipeline (continuous integration).
2. Once the Azure DevOps build pipeline is triggered, it runs following type of tasks:
    - Run for new code: Every time new code is committed to the repo, build pipeline performs data sanity test and unit tests the new code.

    - One-time run: These tasks run only for the first time that the build pipeline runs. They will programatically create an [Azure ML Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace), provision [Azure ML Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) (used for model training compute), and publish an [Azure ML Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). This published Azure ML pipeline is the model training/retraining pipeline.

    > Note: The task Publish Azure ML pipeline currently runs for every code change`

3. The Azure ML Retraining pipeline is triggered once the Azure DevOps build pipeline completes. All the tasks in this pipeline runs on Azure ML Compute created earlier. Following are the tasks in this pipeline:

    - **Train Model** task executes model training script on Azure ML Compute. It outputs a [model](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#model) file which is stored in the [run history](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#run).

    - **Evaluate Model** task evaluates the performance of newly trained model with the model in production. If the new model performs better than the production model, the following steps are executed. If not, they will be skipped.

    - **Register Model** task takes the improved model and registers it with the [Azure ML Model registry](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#model-registry). This allows us to version control it.

    - **Package Model** task packages the new model along with the scoring file and its python dependencies into a [docker image](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#image) and pushes it to [Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro). This image is used to deploy the model as [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service).
    
4. Once a new model scoring image is pushed to Azure Container Registry, the Azure DevOps Release/Deployment pipeline is triggered. This pipeline deploys the model scoring image into Staging/QA and PROD environments.

    - In the Staging/QA environment, one task creates an [Azure Container Instance](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview) and deploys the scoring image as a [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service) on it. 
    
    - The second task tests this web service by calling its REST endpoint with dummy data.

    
5. The deployment in production is a [gated release](https://docs.microsoft.com/en-us/azure/devops/pipelines/release/approvals/gates?view=azure-devops). This means that once the model web service deployment in the Staging/QA environment is successful, a notification is sent to approvers to manually review and approve the release. Once the release is approved, the model scoring web service is deployed to [Azure Kubernetes Service(AKS)](https://docs.microsoft.com/en-us/azure/aks/intro-kubernetes) and the deployment is tested.

### Repo Details

You can find the details of the code and scripts in the repository [here](/docs/code_description.md)

### References
- [Azure Machine Learning(Azure ML) Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml)

- [Azure ML Samples](https://docs.microsoft.com/en-us/azure/machine-learning/service/samples-notebooks)
- [Azure ML Python SDK Quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
- [Azure DevOps](https://docs.microsoft.com/en-us/azure/devops/?view=vsts)
- [DevOps for AI template (Old Version)](https://azuredevopsdemogenerator.azurewebsites.net/?name=azure%20machine%20learning)

# Contributing

This project welcomes contributions and suggestions.  Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00			`### Author: \| Praneet Singh Solanki \| Richin Jain \|`
docs: updated readme 2019-02-06 02:03:43 +03:00
doc: readme update 2019-03-29 21:34:43 +03:00			`# DevOps for AI`
docs: updated readme 2019-02-06 02:03:43 +03:00
build: Pointing the build badge to the new Azure Pipelines in a separate org and also points to correct branch (master) 2019-02-20 15:29:49 +03:00			`[![Build Status](https://dev.azure.com/customai/DevopsForAI-AML/_apis/build/status/Microsoft.DevOpsForAI?branchName=master)](https://dev.azure.com/customai/DevopsForAI-AML/_build/latest?definitionId=1&branchName=master)`
Build: Updated readme to reflect build status 2019-02-12 23:19:25 +03:00
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00

			`DevOps for AI will help you to understand how to build the Continuous Integration and Continuous Delivery pipeline for a ML/AI project. We will be using the Azure DevOps Project for build and release/deployment pipelines along with Azure ML services for model retraining pipeline, model management and operationalization.`
docs: updated readme 2019-02-06 02:03:43 +03:00
doc: readme update 2019-03-29 21:34:43 +03:00			`This template contains code and pipeline definition for a machine learning project demonstrating how to automate the end to end ML/AI project. The build pipelines include DevOps tasks for data sanity test, unit test, model training on different compute targets, model version management, model evaluation/model selection, model deployment as realtime web service, staged deployment to QA/prod and integration testing.`
docs: updated readme 2019-02-06 02:03:43 +03:00
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
docs: updated readme 2019-02-06 02:03:43 +03:00			`## Prerequisite`
			`- Active Azure subscription`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00			`- At least contributor access to Azure subscription`
docs: updated readme 2019-02-06 02:03:43 +03:00
			`## Getting Started:`

doc: readme update 2019-03-29 21:34:43 +03:00			`To deploy this solution in your subscription, follow the manual instructions in the [getting started](docs/getting_started.md) doc`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00

			`## Architecture Diagram`

			`This reference architecture shows how to implement continuous integration (CI), continuous delivery (CD), and retraining pipeline for an AI application using Azure DevOps and Azure Machine Learning. The solution is built on the scikit-learn diabetes dataset but can be easily adapted for any AI scenario and other popular build systems such as Jenkins and Travis.`

doc: readme update 2019-03-29 21:34:43 +03:00			`![Architecture](/docs/images/Architecture_DevOps_AI.png)`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00

			`## Architecture Flow`

doc: readme update 2019-03-29 21:34:43 +03:00			`1. Data Scientist writes/updates the code and push it to git repo. This triggers the Azure DevOps build pipeline (continuous integration).`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00			`2. Once the Azure DevOps build pipeline is triggered, it runs following type of tasks:`
doc: readme update 2019-03-29 21:34:43 +03:00			`- Run for new code: Every time new code is committed to the repo, build pipeline performs data sanity test and unit tests the new code.`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			- One-time run: These tasks run only for the first time that the build pipeline runs. They will programatically create an [Azure ML Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace), provision [Azure ML Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) (used for model training compute), and publish an [Azure ML Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). This published Azure ML pipeline is the model training/retraining pipeline.
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			> Note: The task Publish Azure ML pipeline currently runs for every code change`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
			`3. The Azure ML Retraining pipeline is triggered once the Azure DevOps build pipeline completes. All the tasks in this pipeline runs on Azure ML Compute created earlier. Following are the tasks in this pipeline:`

some typos in readme 2019-04-05 16:40:09 +03:00			`- Train Model task executes model training script on Azure ML Compute. It outputs a [model](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#model) file which is stored in the [run history](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#run).`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			`- Evaluate Model task evaluates the performance of newly trained model with the model in production. If the new model performs better than the production model, the following steps are executed. If not, they will be skipped.`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			`- Register Model task takes the improved model and registers it with the [Azure ML Model registry](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#model-registry). This allows us to version control it.`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			- Package Model task packages the new model along with the scoring file and its python dependencies into a [docker image](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#image) and pushes it to [Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro). This image is used to deploy the model as [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service).
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			`4. Once a new model scoring image is pushed to Azure Container Registry, the Azure DevOps Release/Deployment pipeline is triggered. This pipeline deploys the model scoring image into Staging/QA and PROD environments.`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			`- In the Staging/QA environment, one task creates an [Azure Container Instance](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview) and deploys the scoring image as a [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service) on it.`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00
some typos in readme 2019-04-05 16:40:09 +03:00			`- The second task tests this web service by calling its REST endpoint with dummy data.`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00

some typos in readme 2019-04-05 16:40:09 +03:00			5. The deployment in production is a [gated release](https://docs.microsoft.com/en-us/azure/devops/pipelines/release/approvals/gates?view=azure-devops). This means that once the model web service deployment in the Staging/QA environment is successful, a notification is sent to approvers to manually review and approve the release. Once the release is approved, the model scoring web service is deployed to [Azure Kubernetes Service(AKS)](https://docs.microsoft.com/en-us/azure/aks/intro-kubernetes) and the deployment is tested.
docs: updated readme 2019-02-06 02:03:43 +03:00
			`### Repo Details`

doc: correct readme initial commit 2019-03-29 21:01:33 +03:00			`You can find the details of the code and scripts in the repository [here](/docs/code_description.md)`
docs: updated readme 2019-02-06 02:03:43 +03:00
			`### References`
			`- [Azure Machine Learning(Azure ML) Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml)`

			`- [Azure ML Samples](https://docs.microsoft.com/en-us/azure/machine-learning/service/samples-notebooks)`
			`- [Azure ML Python SDK Quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)`
			`- [Azure DevOps](https://docs.microsoft.com/en-us/azure/devops/?view=vsts)`
doc: correct readme initial commit 2019-03-29 21:01:33 +03:00			`- [DevOps for AI template (Old Version)](https://azuredevopsdemogenerator.azurewebsites.net/?name=azure%20machine%20learning)`
Initial commit 2019-01-29 22:48:10 +03:00
			`# Contributing`

			`This project welcomes contributions and suggestions. Most contributions require you to agree to a`
			`Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us`
			`the rights to use your contribution. For details, visit https://cla.microsoft.com.`

			`When you submit a pull request, a CLA-bot will automatically determine whether you need to provide`
			`a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions`
			`provided by the bot. You will only need to do this once across all repos using our CLA.`

			`This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).`
			`For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or`
			`contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.`