diff --git a/QUICKSTART.md b/QUICKSTART.md deleted file mode 100644 index 3c2bb59..0000000 --- a/QUICKSTART.md +++ /dev/null @@ -1,400 +0,0 @@ -# Quickstart - -## Technical requirements - -- Github as the source control repository -- Azure DevOps or Github Actions as the DevOps orchestration tool -- The [Terraform extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-devlabs.custom-terraform-tasks) if you are using Azure DevOps + Terraform to spin up infrastructure -- Azure service principals to access / create Azure resources from Azure DevOps or Github Actions (or the ability to create them) -- Git bash, WSL or another shell script editor on your local machine; version 2.27 or newer required - - -## Prerequisites ---- - -**Duration: 45min** - -**Note: This demo is based on the beta version for the MLOps Azure Machine Learning Classical ML and CV (Computer Vision) Pattern. Due to ongoing, cli v2 changes and Azure Machine Learning enhencements, the demo can fail. The team is working on keeping the example as up-to-date as possible.** - - -1. Create Service Principal - - For the use of the demo, the creation of one or two service principles is required, depending on how many environments, you want to work on (Dev or Prod or Both). Go into your Azure portal to set those up. - - 1.1. Select Azure Active Directory (AAC) - - ![SP1](./images/SP-setup1.png) - - 1.2. Select App Registrations on the left panel, then select "new registration". - - ![PS2](./images/SP-setup2.png) - - 1.3. Go through the process of creating a Service Principle (SP) selecting "Accounts in any organizational directory (Any Azure AD directory - Multitenant)" and name it "Azure-ARM-Dev-ProjectName". Once created, repeat and create a new SP named "Azure-ARM-Prod-ProjectName". Please replace "ProjectName" with the name of your project so that the service principal can be uniquely identified. - - 1.4. Go to "Certificates & Secrets" and add for each SP "New client secret", then store the value and secret sepperately. - - 1.5. To assign the necessary permissions to these principals, select your respective subscription and go to IAM. Select +Add then select "Add Role Assigment. - - ![PS3](./images/SP-setup3.png) - - 1.6. Select Contributor and add members selecting + Select Members. Add the member "Azure-ARM-Dev-ProjectName" as create before. - - ![SP4](./images/SP-setup4.png) - - 1.7. Repeat step here, if you deploy Dev and Prod into the same subscription, otherwise change to the prod subscription and repeat with "Azure-ARM-Prod-ProjectName". The basic SP setup is successfully finished. - - -2. Set up Github Environment - - 2.1. Go to https://github.com/Azure/mlops-templates/fork to fork the mlops templates repo into your Github org. This repo has reusable mlops code that can be used across multiple projects. - - ![image](./images/gh-fork.png) - - 2.2. Go to https://github.com/Azure/mlops-project-template/generate to create a repository in your Github org using the mlops-project-template. This is the monorepo that you will use to pull example projects from in a later step. - - ![image](./images/gh-generate.png) - - 2.3. Go to your Github organization and create an **empty** repo. This is going to be the repo, into which you'll push your sparse checkout local repo. (more to that later) - - ![Github Use Template](./images/gh-create-empty-mlops-sparse.png) - - - 2.4. Now you should have your own empty Github repository. Let's fill it up! - - 2.5. On your local machine create a directory or use an existing one, which is empty (p.ex. mlopsv2root). Use your shell environment (GitBash, Bash or WSL only) and CD into this directory. Now clone the Azure/mlops-v2 repo, which is going to give you the documentation and the sparse_checkout.sh script with 'git clone https://github.com/Azure/mlops-v2.git' (If you get a 404, you might need to login to Github). This creates a new directory mlops-v2 under mlopsv2root. NOTE: This mlops-v2 folder is only used to bootstrap your project. Your project folder will be generated using the sparse checkout and be linked to the blank repository you created in step 2.3. - - 2.6. Now you need to set a few variables depending on your environment in the script /mlops-v2/sparse_checkout.sh. Open this file in an editor and set the following variables: - - ```console - - infrastructure_version=terraform #options: terraform / bicep - project_type=classical #options: classical / cv - mlops_version=aml-cli-v2 #options: python-sdk / aml-cli-v2 - git_folder_location='' #replace with the local root folder location where you want to create the project folder - project_name=Mlops-Test #replace with your project name - github_org_name=orgname #replace with your github org name - project_template_github_url=https://github.com/azure/mlops-project-template #replace with the url for the project template for your organization created in step 2.2 - orchestration=azure-devops #options: github-actions / azure-devops - ``` - Currently we support classical and cv (computer-vision) pipelines. *NLP is currently under development*, though the CV pipeline can be modified to run NLP models. - - > a few pointers here: - * infrastructure_version gives you deployment choices based on your preferred deployment scenario - * project_type defines the AI workload you want to run in your MLOps system - * mlops_version selects your preferred environment - * git_folder_location points to mlopsv2root - * project_name is the same name (case sensitive), that you used when creating the empty repo in step 2.3 - * github_org_name is your github organization, that you used when creating the empty repo - * project_template_github_url is the URL of the repo you created in step 2.2 - * orchestration is the method of deployment - - 2.7. At the end of the sparse_checkout, it pushes the initilized repo into the new, empty created github repository. In order to do that, we need to authenticate against your github organization by SSH. If not already established, please follow the steps below (see: [Key Setup](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) ): - - 2.7.1 Create a local key in your bash shell by entering: ssh-keygen -t ed25519 -C "your_email@example.com" Please adjust your email address aligned with your github organization. - - 2.7.1.1 You will get promted by 3 different messages regarding your key set-up. You can press "enter" in all three cases and do not have to insert anything. E.g.: Enter a file in which to save the key (/home/you/.ssh/algorithm): [Press enter] - - 2.7.2 Add your SSH key to your SSH agent. Start the SSH agent by entering: eval "$(ssh-agent -s)" It will return your process ID of your agent. Next, add the private key to the SSH agent by executing: ssh-add ~/.ssh/id_ed25519 - - 2.7.3 Now add your SSH key to your github account ([SSH Key Github](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account)) - - 2.7.3.1 Execute in your shell: cat ~/.ssh/id_ed25519.pub to get to public key. Copy everything including the email adress as the end and store it e.g.: ssh-ed25519 ... our_email@example.com - - 2.7.3.2 Now go to github and open your settings. In settings, select "SSH and GPG Keys". Select "New SSH key" and enter a title to the key. Paste your prior stored public key into the key box. Now select "Add SSH key". - - 2.8. Now it's time to execute this script by running in mlopsv2root (if necessary make sure with pwd, that you're in mlopsv2root) in Git Bash or another terminal by running 'bash mlops-v2/sparse_checkout.sh'. This will use the settings in the variables to create a repo for your project which you can use in subsequent steps. - - In case you face any authentication issues, follow this link to authenticate yourself using an ssh key: https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent. - > watch the output of the script, to make sure, no error(s) were happening. And if so remediate them before continuing. You can always delete the project_name named directory and reexecute the script after fixing any error(s). Be sure though, to reposition the current working directory to be mlopsv2root. - - After this step ran successfully, you'll have an additional folder locally under mlopsv2root named after the project_name variable. This is a locally, fully initialized repo, which had been pushed to your new, empty repo, which is now no more empty.(make sure by refreshing on github) - - 2.9. Next, create an SSO token in github by selecting "Developer settings" in your github account settings. - - ![GH1](./images/GH-setup1.png) - - 2.10. Select "Personal Access Token", then generate new token. Select the check boxes and name your token "MLOpsToken". Select "Generate Token". Copy/Paste token key to a notepad interim. - - ![GH2](./images/GH-setup2.png) - - 2.11. If your organization uses single sign on for Github, then click on "Authorize" the token to have access to the github organization. The screenshot below shows an example of authorization for the token to interact with your repository. Your organization maybe different. - - ![GH3](./images/GH-setup3.png) - - The github setup is successfully finished. - - -3. Set up Azure DevOps - - 3.1. Go to [Azure DevOps](https://dev.azure.com/) to set up your MLOps deployment environment. To deploy the infrastructure via ADO (Azure DevOps), you will have to have an organization and a project, with a service connection to your subscription configured. - - 3.2. Create a new project in Azure Devops. Feel free to name it according to your project practices. - - ![ADO Project](./images/ADO-project.png) - - 3.3. In the project under 'Project Settings (at the bottom of the screen when in the project)' select "Service Connections". - - ![ADO1](./images/ADO-setup1.png) - - **Azure Subscription Connection:** - - 3.3.1 Select "New Service Connection". - - ![ADO2](./images/ADO-setup2.png) - - 3.3.2 Select "Azure Resource Manager", select "Next", select "Service principal (manual)", select "Next", select your subscrption where your Service Principal is stored and name the service connection "Azure-ARM-Dev". Fill in the details of the Dev service principal created in step 1. Select "Grant access permission to all pipelines", then select "Save". Repeat this step to create another service connection "Azure-ARM-Prod" using the details of the Prod service principal created in step 1. - - ![ADO3](./images/ado-service-principal-manual.png) - - **Github Connection:** - - 3.3.4 Select "New Service Connection". - - ![ADO4](./images/ADO-setup2.png) - - 3.3.5 Select "Github", select "Next", select "Personal Access Token" and paste your Github SSO Token in the Personal Access token field, in the "Service connection name" field, enter "github-connection", grant pipeline security access, then select "Save". - - Repeat this step, and for the "Service connection name" enter YOUR GITHUB ORGANIZATION NAME or YOUR GITHUB NAME. Finishing this step, your conection should look like this. - - ![ADO6](./images/ADO-setup5.png) - - The Azure DevOps setup is successfully finished. - - -**This finishes the prerequisite section and the deployment of the solution accelerator can happen accordingly. The following sections describe how to setup the appriopriate configuration files and run the inner/outer loops using Azure DevOps or GitHub Actions.** - -This step deploys the training pipeline to the Azure Machine Learning workspace created in the previous steps. - -## Outer Loop: Setting up Infrastructure via GitHub Actions -**IMPORTANT NOTE: Deployment of infrastructure using Terraform or Bicep is not yet enabled yet for Github Actions. It will be available in the next release. This section enables Github Actions to run AML Pipelines.** - 1. Set up your Azure Credentials in a GitHub Secret. - - GitHub Actions need to use an Azure Service Principal to connect to the Azure Machine Learning Service and perform operations. Additional details can be found [here](https://github.com/marketplace/actions/azure-login#configure-deployment-credentials). - - 2. Go to your Github cloned repo and select the "config-infra-prod.yml" file. - - ![ADO Run4](./images/ADO-run4.png) - - Under global, there's two values namespace and postfix. These values should render the names of the artifacts to create unique. Especially the name for the storage account, which has the most rigid constraints, like uniqueness Azure wide and 3-5 lowercase characters and numbers. So please change namespace and/or postfix to a value of your liking and remember to stay within the contraints of a storage account name as mentioned above. Then save, commit, push, pr to get these values into the pipeline. - - If your are running a Deep Learning workload such as CV or NLP, you have to ensure your GPU compute is availible in your deployment zone. Please replace as shown above your location to eastus. Example: - - namespace: [5 max random new letters] - postfix: [4 max random new digits] - location: eastus - - Please repeat this step for "config-infra-dev.yml" and "config-infra-prod.yml"! - -## Outer Loop: Deploying Infrastructure via Azure DevOps ---- - - 1. Go to your Github cloned repo and select the "config-infra-prod.yml" file. - - ![ADO Run4](./images/ADO-run4.png) - - Under global, there's two values namespace and postfix. These values should render the names of the artifacts to create unique. Especially the name for the storage account, which has the most rigid constraints, like uniqueness Azure wide and 3-5 lowercase characters and numbers. So please change namespace and/or postfix to a value of your liking and remember to stay within the contraints of a storage account name as mentioned above. Then save, commit, push, pr to get these values into the pipeline. - - If your are running a Deep Learning workload such as CV or NLP, you have to ensure your GPU compute is availible in your deployment zone. Please replace as shown above your location to eastus. Example: - - namespace: [5 max random new letters] - postfix: [4 max random new digits] - location: eastus - - Please repeat this step for "config-infra-dev.yml" and "config-infra-prod.yml"! - - 2. Go to ADO pipelines - - ![ADO Pipelines](./images/ADO-pipelines.png) - - 3. Select "New Pipeline". - - ![ADO Run1](./images/ADO-run1.png) - - 4. Select "Github". - - ![ADO Where's your code](./images/ado-wheresyourcode.png) - - 5. Select your /MLOps-Test repository. ("Empty" repository you created in 2.3) - - ![ADO Run2](./images/ADO-run2.png) - - If your new repository is not visible, then click on the "provide access" link and on the next screen, click on the "grant" button next to the organization name to grant access to your organization. - - 6. Select "Existing Azure Pipeline YAML File" - - ![ADO Run3](./images/ADO-run3.png) - - - 7. Select "main" as a branch and choose based on your deployment method your preferred yml path. For a terraform schenario choose: 'infrastructure/pipelines/tf-ado-deploy-infra.yml', then select "Continue". For a bicep schenario choose: 'infrastructure/pipelines/bicep-ado-deploy-infra.yml', then select "Continue". - - ![Select Infrastructure Pipeline](./images/ado-select-pipeline-yaml-file.png) - - - - 8. Run the pipeline. This will take a few minutes to finish. The pipeline should create the following artifacts: - * Resource Group for your Workspace including Storage Account, Container Registry, Application Insights, Keyvault and the Azure Machine Learning Workspace itself. - * In the workspace there's also a compute cluster created. - - ![ADO Run5](./images/ADO-run5.png) - - Now the Outer Loop of the MLOps Architecture is deployed. - - ![ADO Run6](./images/ADO-run-infra-pipeline.png) - -## Inner Loop: Deploying Classical ML Model Development / Moving to Test Environment - GitHub Actions - 1. Go to the GitHub Actions tab. - - ![GHA Tab](./images/GHATab.png) - - 2. Select the "deploy-model-training-pipeline" from the Actions listed on the left and the click "Run Workflow" to execute the model training workflow. This will take several minutes to run, depending on the compute size. - - ![Pipeline Run](./images/PipelineRun.png) - - Once completed a successful run will train the model in the Azure Machine Learning Workspace. - >**IMPORTANT: If you want to check the output of each individual step, for example to view output of a failed run, click a job output and then click each step in the job to view any output of that step. - - ![Output](./images/expandedElement.png) - > - -## Inner Loop: Deploying Classical ML Model Development / Moving to Test Environment - Azure DevOps ---- - - 1. Go to ADO pipelines - - ![ADO Pipelines](./images/ADO-pipelines.png) - - 2. Select "New Pipeline". - - ![ADO Run1](./images/ADO-run1.png) - - 3. Select "Github". - - ![ADO Where's your code](./images/ado-wheresyourcode.png) - - 4. Select your /MLOps-Test repository! ("Empty" repository you created in 2.3) - - ![ADO Run2](./images/ADO-run2.png) - - 5. Select "Existing Azure Pipeline YAML File" - - ![ADO Run3](./images/ADO-run3.png) - - 6. Select "main" as a branch and choose '/mlops/devops-pipelines/deploy-model-training-pipeline.yml', then select "Continue". - - ![ADO Run9](./images/ADO-run9.png) - - >**IMPORTANT: This pipeline needs an additional connection to the Github repo yourorgname/mlops-templates, where all the templates are stored and maintained, which, like legos, encapsulate certain functionality. That's why you see in the pipeline itself a lot of calls to '-template: template in mlops-templates'. These functionalities are install the azure cli, or ml extension or run a pipeline etc. Therefore we created the connection 'github-connection' in the beginning currenly hard-coded.** - -## Inner Loop: Checkpoint - - At this point, the infrastructure is configured and the Inner Loop of the MLOps Architecture is deployed. We are ready to move to our trained model to production. - - -## Inner / Outer Loop: Moving to Production - Introduction ---- - - >**NOTE: This is an end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs.** - - >**Prepare Data - This component takes multiple taxi datasets (yellow and green) and merges/filters the data, and prepare the train/val and evaluation datasets. - Input: Local data under ./data/ (multiple .csv files) - Output: Single prepared dataset (.csv) and train/val/test datasets.** - - >**Train Model - This component trains a Linear Regressor with the training set. - Input: Training dataset - Output: Trained model (pickle format)** - - >**Evaluate Model - This component uses the trained model to predict taxi fares on the test set. - Input: ML model and Test dataset - Output: Performance of model and a deploy flag whether to deploy or not. - This component compares the performance of the model with all previous deployed models on the new test dataset and decides whether to promote or not model into production. Promoting model into production happens by registering the model in AML workspace.** - - >**Register Model - This component scores the model based on how accurate the predictions are in the test set. - Input: Trained model and the deploy flag. - Output: Registered model in Azure Machine Learning.** - -## Inner / Outer Loop: Moving to Production - GitHub Actions ---- - - 1. Go to the GitHub Actions tab. - - ![GHA Tab](./images/GHATab.png) - - 2. Select either the "deploy-batch-endpoint-pipeline" or the "deploy-online-endpoint-pipeline" from the Actions listed on the left and the click "Run Workflow" to execute the model training workflow. This will take several minutes to run, depending on the compute size. - - ![GHA Tab](./images/onlineEndpoint.png) - - Once completed, a successful run will deploy the model trained in the previous step to either a batch or online endpoint, depending on which workflow is run. - -## Inner / Outer Loop: Moving to Production - Azure DevOps ---- - - 1. Go to ADO pipelines - - ![ADO Pipelines](./images/ADO-pipelines.png) - - 2. Select "New Pipeline". - - ![ADO Run1](./images/ADO-run1.png) - - 3. Select "Github". - - ![ADO Where's your code](./images/ado-wheresyourcode.png) - - 4. Select your /MLOps-Test repository! ("Empty" repository you created in 2.3) - - ![ADO Run2](./images/ADO-run2.png) - - 5. Select "Existing Azure Pipeline YAML File" - - ![ADO Run3](./images/ADO-run3.png) - - 6. Select "main" as a branch and choose: - For Classical Machine Learning: - Managed Batch Endpoint '/mlops/devops-pipelines/deploy-batch-endpoint-pipeline.yml' - Managed Online Endpoint '/mlops/devops-pipelines/deploy-online-endpoint-pipeline.yml' - For Computer Vision: - Managed Online Endpoint '/mlops/devops-pipelines/deploy-batch-endpoint-pipeline.yml' - - Then select "Continue". - - ![ADO Run10](./images/ADO-run10.png) - - 7. Batch/Online endpoint names need to be unique, so please change [your endpointname] to another unique name and then select "Run". - - ![ADO Run11](./images/ADO-batch-pipeline.png) - - **IMPORTANT: If the run fails due to an existing online endpoint name, recreate the pipeline as discribed above and change [your endpointname] to [your endpointname [random number]]"** - - 8. When the run completes, you will see: - - ![ADO Run12](./images/ADO-batch-pipeline-run.png) - - Now the Inner Loop is connected to the Outer of the MLOps Architecture and inference has been run. - - - -## Next Steps ---- - -This finishes the demo according to the architectual patters: Azure Machine Learning Classical Machine Learning, Azure Machine Learning Computer Vision. Next you can dive into your Azure Machine Learning service in the Azure Portal and see the inference results of this example model. - -As the CLI v2 is still in development, the following components are not part of this demo: -- Model Monitoring for Data/Model Drift -- Automated Retraining -- Model and Infrastructure triggers - -As the development team builds according to the Product Groups release plan, no custom components are going to be developed rather it is intended to wait for full GA release of the cli v2 to address those components. - -Interim it is recommended to schedule the deployment pipeline for development for complete model retraining on a timed trigger. - -For questions, please hand in an issue or reach out to the development team at Microsoft. - - - - - - diff --git a/README.md b/README.md index 2cb4f49..9e5490e 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ -# Azure MLOps (v2) solution accelerator +# Azure MLOps (v2) Solution Accelerator -![Header](documentation/repositoryfiles/mlopsheader.jpg) +![Header](media/mlopsheader.jpg) -Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as *the* starting point for MLOps implementation in Azure. +Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting point for MLOps implementation in Azure. MLOps is a set of repeatable, automated, and collaborative workflows with best practices that empower teams of ML professionals to quickly and easily get their machine learning models deployed into production. You can learn more about MLOps here: @@ -10,12 +10,6 @@ MLOps is a set of repeatable, automated, and collaborative workflows with best p - [Cloud Adoption Framework Guidance](https://docs.microsoft.com/azure/cloud-adoption-framework/ready/azure-best-practices/ai-machine-learning-mlops) - [How: Machine Learning Operations](https://docs.microsoft.com/azure/machine-learning/concept-model-management-and-deployment) -## Prerequisites - -1. An Azure subscription. If you don't have an Azure subscription, [create a free account](https://aka.ms/AzureMLFree) before you begin. -2. The [Terraform extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-devlabs.custom-terraform-tasks) if you are using Terraform to spin up infrastructure -3. Git bash, WSL or another shell script editor on your local machine - ## Project overview The solution accelerator provides a modular end-to-end approach for MLOps in Azure based on pattern architectures. As each organization is unique, solutions will often need to be customized to fit the organization's needs. @@ -30,54 +24,23 @@ The solution accelerator goals are: It accomplishes these goals with a template-based approach for end-to-end data science, driving operational efficiency at each stage. You should be able to get up and running with the solution accelerator in a few hours. -## 👤 Getting started: Azure Machine Learning Pattern Demo - -The demo follows the classical machine learning or computer vision pattern with Azure Machine Learning. +## Prerequisites -Azure Machine Learning - Classical Machine Learning Architecture: -![AzureML CML](/documentation/architecturepattern/AzureML_CML_Architecture.png) +1. An Azure subscription. If you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/en-us/free/machine-learning/search/?OCID=AIDcmm5edswduu_SEM_822a7351b5b21e0f1ffe102c9ca9e99f:G:s&ef_id=822a7351b5b21e0f1ffe102c9ca9e99f:G:s&msclkid=822a7351b5b21e0f1ffe102c9ca9e99f) before you begin. +2. For Azure DevOps-based deployments and projects: + * [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) with `azure-devops` extension. + * [Terraform extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-devlabs.custom-terraform-tasks) if you are using Terraform to spin up infrastructure +3. For GitHub-based deployments and projects: + * [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) + * [GitHub client](https://cli.github.com/) +3. Git bash, WSL, or another shell script editor on your local machine -Azure Machine Learning - Computer Vision Architecture: -![AzureML CV](/documentation/architecturepattern/AzureML_SupervisedCV_Architecture.png) - -‼️ **Please follow the instructions to execute the demo accordingly: [Quickstart](https://github.com/Azure/mlops-v2/blob/main/QUICKSTART.md)** ‼️ +## Documentation -‼️ **Please submit any issues here: [Issues](https://github.com/Azure/mlops-v2/issues)** ‼️ - -## 📐 Pattern Architectures: Key concepts - -| Link | AI Pattern | -| ------------------------------------------------------- | ----------------------------------------------------------------------- | -| [Pattern AzureML CML](https://github.com/Azure/mlops-v2/blob/main/documentation/architecturepattern/AzureML_CML_Architecture.png) | Azure Machine Learning - Classical Machine Learning | -| [Pattern AzureML CV](https://github.com/Azure/mlops-v2/blob/main/documentation/architecturepattern/AzureML_SupervisedCV_Architecture.png) | Azure Machine Learning - Computer Vision | -| [Pattern AzureML NLP](https://github.com/Azure/mlops-v2/blob/main/documentation/architecturepattern/AzureML_NLP_Classification_Architecture.png) | Azure Machine Learning - Natural Language Processing | -| [TBD] | Azure Machine Learning / Azure Databricks - Classical Machine Learning | -| [TBD] | Azure Machine Learning / Azure Databricks - Computer Vision | -| [TBD] | Azure Machine Learning / Azure Databricks - Natural Language Processing | -| [TBD] | Azure Machine Learning - Edge AI | - -## 📯 (Coming Soon) One-click deployments - -## 📯 MLOps infrastructure deployment - -| Name | Description | Try it out | -| ------------------------------------------------------------ | ---------------------------------------------------------- | --------------- | -| [Outer Loop](https://github.com/Azure/mlops-templates) | Default Azure Machine Learning outer infrastructure setup | [DEPLOY BUTTON] | -| [TBD] | Default Responsible AI for Classical Machine Learning | [DEPLOY BUTTON] | -| [Feature Store FEAST](https://github.com/Azure/feast-azure) | Default Feature Store using FEAST | [DEPLOY BUTTON] | -| [Feature Store Feathr](https://github.com/linkedin/feathr) | Feature Store Pattern using Feathr | [DEPLOY BUTTON] | - -## 📯 MLOps use case deployment - -| Name | AI Workload Type | Services | Try it out | -|-------------------------------------------------------------------- | -----------------------------------| ---------------------------------------- | --------------- | -| [classical-ml](https://github.com/Azure/mlops-project-template/tree/main/classical) | Classical machine learning | Azure Machine Learning | [DEPLOY BUTTON] | -| [CV](https://github.com/Azure/mlops-project-template/tree/main/cv) | Computer Vision | Azure Machine Learning | [DEPLOY BUTTON] | -| [TBD] | Natural Language Processing | Azure Machine Learning | [DEPLOY BUTTON] | -| [TBD] | Classical machine learning | Azure Machine Learning, Azure Databricks | [DEPLOY BUTTON] | -| [TBD] | Computer Vision | Azure Machine Learning, Azure Databricks | [DEPLOY BUTTON] | -| [TBD] | Natural Language Processing | Azure Machine Learning, Azure Databricks | [DEPLOY BUTTON] | -| [TBD] | Edge AI | Azure Machine Learning | [DEPLOY BUTTON] | +1. [Solution Accelerator Concepts and Structure](documentation/structure/README.md) - Philosophy and organization +2. [Architectural Patterns](documentation/architecture/README.md) - Supported Machine Learning patterns +3. [Accelerator Deployment Guides](documentation/deploymentguides/README.md) - How to deploy and use the soluation accelerator with Azure DevOps or GitHub +4. **Coming soon** Quickstarts - Precreated project scenarios for demos/POCs ## Contributing diff --git a/documentation/README.md b/documentation/README.md deleted file mode 100644 index 8b407e8..0000000 --- a/documentation/README.md +++ /dev/null @@ -1,179 +0,0 @@ -# MLOps v2 Architectures - -The MLOps v2 architectural pattern is made up of four main modular elements representing phases of the MLOps lifecycle. - -- Data Estate - -- Administration & Setup - -- Model Development (Inner Loop) - -- Model Deployment (Outer Loop) - -These elements, the relationships between them, and the personas typically associated with these elements are common for all MLOps v2 scenario architectures though there may some variations in the details of each depending on the scenario. - -The base architecture for MLOps v2 for Azure Machine Learning is the Classical Machine Learning scenario on tabular data. Other scenarios like Computer Vision (CV) and Natural Language Processing (NLP) build on or modify this base architecture as appropriate. - -## Current Architectures - -- [Azure Machine Learning Classical ML Architecture](#azure-machine-learning-classical-ml-architecture) -- [Azure Machine Learning Computer Vision Architecture](#azure-machine-learning-computer-vision-architecture) -- [Azure Machine Learning Natural Language Processing Architecture](#azure-machine-learning-natural-language-processing-architecture) - -### Azure Machine Learning Classical ML Architecture - -Below is the MLOps v2 architecture for a Classical Machine Learning scenario on tabular data along with explanation of the main elements and details. - -![Azure Machine Learning Classical Machine Learning Architecture](architecturepattern/AzureML_CML_Architecture.png) - -1. **Data Estate** - - This element illustrates the organization data estate and potential data sources and targets for a data science project. Data Engineers would be the primary owners of this element of the MLOps v2 lifecycle. The Azure data platforms in this diagram are neither exhaustive nor prescriptive. However, data sources and targets that represent recommended best practices based on customer use case are indicated by the green check. - -2. **Administration & Setup** - - This element is the first step in the MLOps v2 Accelerator deployment. It consists of all tasks related to creation and management of resources and roles associated with the project. These can include but may not be limited to: - - - Creation of project source code repositories. - - - Creation of Azure Machine Learning Workspaces for the project using Bicep, ARM, or Terraform. - - - Creation/modification of Data Sets and Compute Resources used for model development and deployment. - - - Definition of project team users, their roles, and access controls to other resources. - - - Creation of CI/CD (Continuous Integration and Continuous Delivery) pipelines - - - Creation of Monitors for collection and notification of model and infrastructure metrics. - - Personas associated with this phase may be primarily Infrastructure Team but may also include all of Data Engineers, Machine Learning Engineers, and Data Scientists. - -3. **Model Development (Inner Loop)** - - The inner loop element consists of your iterative data science workflow performed within a dedicated, secure Azure Machine Learning Workspace. A typical workflow is illustrated here from data ingestion, EDA (Exploratory Data Analysis), experimentation, model development and evaluation, to registration of a candidate model for production. This modular element as implemented in the MLOps v2 accelerator is agnostic and adaptable to the process your data science team may use to develop models. - - Personas associated with this phase include Data Scientists and ML Engineers. - -4. **Azure Machine Learning Registries** - - When the Data Science team has developed a model that is a candidate for deploying to production, the model can be registered in the Azure Machine Learning workspace registry. Continuous Integration (CI) pipelines triggered either automatically by model registration and/or gated human-in-the-loop approval promote the model and any other model dependencies to the model Deployment phase. - - Personas associated with this stage are typically ML Engineers. - -5. **Model Deployment (Outer Loop)** - - The Model Deployment or Outer Loop phase consists of pre-production staging and testing, production deployment, and monitoring of both model/data and infrastructure. Continuous Deployment (VD) pipelines manage the promotion of the model and related assets through production, monitoring, and potential retraining as criteria appropriate to your organization and use case are satisfied. - - Personas associated with this phase are primarily ML Engineers. - -6. **Staging & Test** - - The Staging & Test phase can vary with customer practices but typically includes operations such as retraining and testing of the model candidate on production data, test deployments for endpoint performance, data quality checks, unit testing, and Responsible AI checks for model and data bias. This phase takes place in one or more dedicated, secure Azure Machine Learning Workspaces. - -7. **Production Deployment** - - After a model passes the Staging & Test phase, the model can be promoted to production via a human-in-the-loop gated approvals. Model deployment options include a Batch Managed Endpoint for batch scenarios or, for online, near-realtime scenarios, either an Online Managed Endpoint or to Kubernetes using Azure Arc. Production typically takes place in one or more dedicated, secure Azure Machine Learning Workspaces. - -8. **Monitoring** - - Monitoring in staging/test and production enables you to collect metrics for and act on changes in performance of the model, data, and infrastructure. Model and data monitoring may include checking for model and data drift, model performance on new data, and Responsible AI issues. Infrastructure monitoring can watch for issues with endpoint response time, problems with deployment compute capacity, or network issues. - -9. **Data & Model Monitoring - Events and Actions** - - Based on criteria for model and data monitors of concern such as metric thresholds or schedules, automated triggers and notifications can implement appropriate actions to take. This may be regularly scheduled automated retraining of the model on newer production data and a loop back to Staging & Test for pre-production evaluation or it may be due to triggers on model or data issues that require a loop back to the Model Development phase where Data Scientists can investigate and potentially develop a new model. - -10. **Infrastructure Monitoring - Events and Actions** - - Based on criteria for infrastructure monitors of concern such as endpoint response lag or insufficient compute for the deployment, automated triggers and notifications can implement appropriate actions to take. This triggers a loop back to the Setup & Administration phase where the Infrastructure Team can investigate and potentially reconfigure environment compute and network resources. - -### Azure Machine Learning Computer Vision Architecture - -![Azure Machine Learning Computer Vision Architecture](architecturepattern/AzureML_SupervisedCV_Architecture.png) - -The Azure Machine Learning Computer Vision Architecture is based on the Classical Machine Learning Architecture with some modifications particular to supervised CV scenarios. - -1. **Data Estate** - - This element illustrates the organization data estate and potential data sources and targets for a data science project. Data Engineers would be the primary owners of this element of the MLOps v2 lifecycle. The Azure data platforms in this diagram are neither exhaustive nor prescriptive. Images for Computer Vision scenarios may come from many different data sources. For efficiency when developing and deploying CV models with Azure Machine Learning, recommended Azure data sources for images are Azure Blob Storage and Azure Data Lake Storage. - -2. **Administration & Setup** - - This element is the first step in the MLOps v2 Accelerator deployment. It consists of all tasks related to creation and management of resources and roles associated with the project. For CV scenarios, Administration & Setup of the MLOps v2 environment is largely the same as for Classical Machine Learning with the addition of creation of Image Labeling and Annotation projects that can use the Labeling feature of Azure Machine Learning or other tools. - -3. **Model Development (Inner Loop)** - - The inner loop element consists of your iterative data science workflow performed within a dedicated, secure Azure Machine Learning Workspace. The primary difference between this workflow and the Classical Machine Learning scenario in that Image Labeling/Annotation is a key element of this development loop. - -4. **Azure Machine Learning Registries** - - When the Data Science team has developed a model that is a candidate for deploying to production, the model can be registered in the Azure Machine Learning workspace registry. Continuous Integration (CI) pipelines triggered either automatically by model registration and/or gated human-in-the-loop approval promote the model and any other model dependencies to the model Deployment phase. - -5. **Model Deployment (Outer Loop)** - - The Model Deployment or Outer Loop phase consists of pre-production staging and testing, production deployment, and monitoring of both model/data and infrastructure. Continuous Deployment (CD) pipelines manage the promotion of the model and related assets through production, monitoring, and potential retraining as criteria appropriate to your organization and use case are satisfied. - -6. **Staging & Test** - - The Staging & Test phase can vary with customer practices but typically includes operations such as test deployments for endpoint performance, data quality checks, unit testing, and Responsible AI checks for model and data bias. For CV scenarios, retraining of the model candidate on production data may not be done due to resource and time constraints. Rather the data science team may have access to production data for model development and the candidate model registered from the development loop is the "final" model to be evaluated for production. This phase takes place in one or more dedicated, secure Azure Machine Learning Workspaces. - -7. **Production Deployment** - - After a model passes the Staging & Test phase, the model can be promoted to production via human-in-the-loop gated approvals. Model deployment options include a Batch Managed Endpoint for batch scenarios or, for online, near-realtime scenarios, either an Online Managed Endpoint or to Kubernetes using Azure Arc. Production typically takes place in one or more dedicated, secure Azure Machine Learning Workspaces. - -8. **Monitoring** - - Monitoring in staging/test and production enables you to collect metrics for and act on changes in performance of the model, data, and infrastructure. Model and data monitoring may include checking for model performance on new images. Infrastructure monitoring can watch for issues with endpoint response time, problems with deployment compute capacity, or network issues. - -9. **Data & Model Monitoring - Events and Actions** - - The Data & Model monitoring and event/action phase of MLOps for Computer Vision is the key difference from Classical Machine Learning. Automated retraining is typically not done in CV scenarios when model performance degradation on new images is detected. In this case, new images for which the model performs poorly must be reviewed and annotated by a human-in-the-loop and often the next action goes back to the Model Development loop for updating the model with the new images. - -10. **Infrastructure Monitoring - Events and Actions** - - Based on criteria for infrastructure monitors of concern such as endpoint response lag or insufficient compute for the deployment, automated triggers and notifications can implement appropriate actions to take. This triggers a loop back to the Setup & Administration phase where the Infrastructure Team can investigate and potentially reconfigure environment compute and network resources. - -### Azure Machine Learning Natural Language Processing Architecture - -![Azure Machine Learning Natural Language Processing Architecture](architecturepattern/AzureML_NLP_Classification_Architecture.png) - -The Azure Machine Learning Natural Language Processing Architecture is based on the Classical Machine Learning Architecture with some modifications particular to NLP scenarios. - -1. **Data Estate** - - This element illustrates the organization data estate and potential data sources and targets for a data science project. Data Engineers would be the primary owners of this element of the MLOps v2 lifecycle. The Azure data platforms in this diagram are neither exhaustive nor prescriptive. However, data sources and targets that represent recommended best practices based on customer use case are indicated by the green check - -2. **Administration & Setup** - - This element is the first step in the MLOps v2 Accelerator deployment. It consists of all tasks related to creation and management of resources and roles associated with the project. For NLP scenarios, Administration & Setup of the MLOps v2 environment is largely the same as for Classical Machine Learning with the addition of creation of Image Labeling and Annotation projects that can use the Labeling feature of Azure Machine Learning or other tools. - -3. **Model Development (Inner Loop)** - - The inner loop element consists of your iterative data science workflow performed within a dedicated, secure Azure Machine Learning Workspace. The typical NLP model development loop can be significantly different from the Classical Machine Learning scenario in that Annotators for Sentences and Tokenization, Normalization, and Embeddings for text data are the typical development steps for this scenario. - -4. **Azure Machine Learning Registries** - - When the Data Science team has developed a model that is a candidate for deploying to production, the model can be registered in the Azure Machine Learning workspace registry. Continuous Integration (CI) pipelines triggered either automatically by model registration and/or gated human-in-the-loop approval promote the model and any other model dependencies to the model Deployment phase. - -5. **Model Deployment (Outer Loop)** - - The Model Deployment or Outer Loop phase consists of pre-production staging and testing, production deployment, and monitoring of both model/data and infrastructure. Continuous Deployment (CD) pipelines manage the promotion of the model and related assets through production, monitoring, and potential retraining as criteria appropriate to your organization and use case are satisfied. - -6. **Staging & Test** - - The Staging & Test phase can vary with customer practices but typically includes operations such as retraining and testing of the model candidate on production data, test deployments for endpoint performance, data quality checks, unit testing, and Responsible AI checks for model and data bias. This phase takes place in one or more dedicated, secure Azure Machine Learning Workspaces. - -7. **Production Deployment** - - After a model passes the Staging & Test phase, the model can be promoted to production via a human-in-the-loop gated approvals. Model deployment options include a Batch Managed Endpoint for batch scenarios or, for online, near-realtime scenarios, either an Online Managed Endpoint or to Kubernetes using Azure Arc. Production typically takes place in one or more dedicated, secure Azure Machine Learning Workspaces. - -8. **Monitoring** - - Monitoring in staging/test and production enables you to collect and act on changes in performance of the model, data, and infrastructure. Model and data monitoring may include checking for model and data drift, model performance on new text data, and Responsible AI issues. Infrastructure monitoring can watch for issues with endpoint response time, problems with deployment compute capacity, or network issues. - -9. **Data & Model Monitoring - Events and Actions** - - As with the Computer Vision architecture, the Data & Model monitoring and event/action phase of MLOps for Natural Language Processing is the key difference from Classical Machine Learning. Automated retraining is typically not done in NLP scenarios when model performance degradation on new text is detected. In this case, new text data for which the model performs poorly must be reviewed and annotated by a human-in-the-loop and often the next action goes back to the Model Development loop for updating the model with the new text data. - -10. **Infrastructure Monitoring - Events and Actions** - - Based on criteria for infrastructure monitors of concern such as endpoint response lag or insufficient compute for the deployment, automated triggers and notifications can implement appropriate actions to take. This triggers a loop back to the Setup & Administration phase where the Infrastructure Team can investigate and potentially reconfigure environment compute and network resources. diff --git a/documentation/architecture/README.md b/documentation/architecture/README.md new file mode 100644 index 0000000..71670d0 --- /dev/null +++ b/documentation/architecture/README.md @@ -0,0 +1,21 @@ +# MLOps v2 Architectures + +The MLOps v2 architectural pattern is made up of four main modular elements representing phases of the MLOps lifecycle. + +- Data Estate + +- Administration & Setup + +- Model Development (Inner Loop) + +- Model Deployment (Outer Loop) + +These elements, the relationships between them, and the personas typically associated with these elements are common for all MLOps v2 scenario architectures though there may some variations in the details of each depending on the scenario. + +The base architecture for MLOps v2 for Azure Machine Learning is the Classical Machine Learning scenario on tabular data. Other scenarios like Computer Vision (CV) and Natural Language Processing (NLP) build on or modify this base architecture as appropriate. + +## Current Architectures + +- [Azure Machine Learning Classical ML Architecture](classical.md) +- [Azure Machine Learning Computer Vision Architecture](vision.md) +- [Azure Machine Learning Natural Language Processing Architecture](nlp.md) \ No newline at end of file diff --git a/documentation/architecture/classical.md b/documentation/architecture/classical.md new file mode 100644 index 0000000..2c51bea --- /dev/null +++ b/documentation/architecture/classical.md @@ -0,0 +1,65 @@ +# Azure Machine Learning Classical ML Architecture + +Below is the MLOps v2 architecture for a Classical Machine Learning scenario on tabular data using Azure Machine Learning along with explanation of the main elements and details. + +![Azure Machine Learning Classical Machine Learning Architecture](media/AzureML_CML_Architecture.png) + +1. **Data Estate** + + This element illustrates the organization data estate and potential data sources and targets for a data science project. Data Engineers would be the primary owners of this element of the MLOps v2 lifecycle. The Azure data platforms in this diagram are neither exhaustive nor prescriptive. However, data sources and targets that represent recommended best practices based on customer use case are indicated by the green check. + +2. **Administration & Setup** + + This element is the first step in the MLOps v2 Accelerator deployment. It consists of all tasks related to creation and management of resources and roles associated with the project. These can include but may not be limited to: + + - Creation of project source code repositories. + + - Creation of Azure Machine Learning Workspaces for the project using Bicep, ARM, or Terraform. + + - Creation/modification of Data Sets and Compute Resources used for model development and deployment. + + - Definition of project team users, their roles, and access controls to other resources. + + - Creation of CI/CD (Continuous Integration and Continuous Delivery) pipelines + + - Creation of Monitors for collection and notification of model and infrastructure metrics. + + Personas associated with this phase may be primarily Infrastructure Team but may also include all of Data Engineers, Machine Learning Engineers, and Data Scientists. + +3. **Model Development (Inner Loop)** + + The inner loop element consists of your iterative data science workflow performed within a dedicated, secure Azure Machine Learning Workspace. A typical workflow is illustrated here from data ingestion, EDA (Exploratory Data Analysis), experimentation, model development and evaluation, to registration of a candidate model for production. This modular element as implemented in the MLOps v2 accelerator is agnostic and adaptable to the process your data science team may use to develop models. + + Personas associated with this phase include Data Scientists and ML Engineers. + +4. **Azure Machine Learning Registries** + + When the Data Science team has developed a model that is a candidate for deploying to production, the model can be registered in the Azure Machine Learning workspace registry. Continuous Integration (CI) pipelines triggered either automatically by model registration and/or gated human-in-the-loop approval promote the model and any other model dependencies to the model Deployment phase. + + Personas associated with this stage are typically ML Engineers. + +5. **Model Deployment (Outer Loop)** + + The Model Deployment or Outer Loop phase consists of pre-production staging and testing, production deployment, and monitoring of both model/data and infrastructure. Continuous Deployment (VD) pipelines manage the promotion of the model and related assets through production, monitoring, and potential retraining as criteria appropriate to your organization and use case are satisfied. + + Personas associated with this phase are primarily ML Engineers. + +6. **Staging & Test** + + The Staging & Test phase can vary with customer practices but typically includes operations such as retraining and testing of the model candidate on production data, test deployments for endpoint performance, data quality checks, unit testing, and Responsible AI checks for model and data bias. This phase takes place in one or more dedicated, secure Azure Machine Learning Workspaces. + +7. **Production Deployment** + + After a model passes the Staging & Test phase, the model can be promoted to production via a human-in-the-loop gated approvals. Model deployment options include a Batch Managed Endpoint for batch scenarios or, for online, near-realtime scenarios, either an Online Managed Endpoint or to Kubernetes using Azure Arc. Production typically takes place in one or more dedicated, secure Azure Machine Learning Workspaces. + +8. **Monitoring** + + Monitoring in staging/test and production enables you to collect metrics for and act on changes in performance of the model, data, and infrastructure. Model and data monitoring may include checking for model and data drift, model performance on new data, and Responsible AI issues. Infrastructure monitoring can watch for issues with endpoint response time, problems with deployment compute capacity, or network issues. + +9. **Data & Model Monitoring - Events and Actions** + + Based on criteria for model and data monitors of concern such as metric thresholds or schedules, automated triggers and notifications can implement appropriate actions to take. This may be regularly scheduled automated retraining of the model on newer production data and a loop back to Staging & Test for pre-production evaluation or it may be due to triggers on model or data issues that require a loop back to the Model Development phase where Data Scientists can investigate and potentially develop a new model. + +10. **Infrastructure Monitoring - Events and Actions** + + Based on criteria for infrastructure monitors of concern such as endpoint response lag or insufficient compute for the deployment, automated triggers and notifications can implement appropriate actions to take. This triggers a loop back to the Setup & Administration phase where the Infrastructure Team can investigate and potentially reconfigure environment compute and network resources. diff --git a/documentation/architecturepattern/AzureML_CML_Architecture.png b/documentation/architecture/media/AzureML_CML_Architecture.png similarity index 100% rename from documentation/architecturepattern/AzureML_CML_Architecture.png rename to documentation/architecture/media/AzureML_CML_Architecture.png diff --git a/documentation/architecturepattern/AzureML_CML_Architecture.vsdx b/documentation/architecture/media/AzureML_CML_Architecture.vsdx similarity index 100% rename from documentation/architecturepattern/AzureML_CML_Architecture.vsdx rename to documentation/architecture/media/AzureML_CML_Architecture.vsdx diff --git a/documentation/architecturepattern/AzureML_NLP_Classification_Architecture.png b/documentation/architecture/media/AzureML_NLP_Classification_Architecture.png similarity index 100% rename from documentation/architecturepattern/AzureML_NLP_Classification_Architecture.png rename to documentation/architecture/media/AzureML_NLP_Classification_Architecture.png diff --git a/documentation/architecturepattern/AzureML_NLP_Classification_Architecture.vsdx b/documentation/architecture/media/AzureML_NLP_Classification_Architecture.vsdx similarity index 100% rename from documentation/architecturepattern/AzureML_NLP_Classification_Architecture.vsdx rename to documentation/architecture/media/AzureML_NLP_Classification_Architecture.vsdx diff --git a/documentation/architecturepattern/AzureML_SupervisedCV_Architecture.png b/documentation/architecture/media/AzureML_SupervisedCV_Architecture.png similarity index 100% rename from documentation/architecturepattern/AzureML_SupervisedCV_Architecture.png rename to documentation/architecture/media/AzureML_SupervisedCV_Architecture.png diff --git a/documentation/architecturepattern/AzureML_SupervisedCV_Architecture.vsdx b/documentation/architecture/media/AzureML_SupervisedCV_Architecture.vsdx similarity index 100% rename from documentation/architecturepattern/AzureML_SupervisedCV_Architecture.vsdx rename to documentation/architecture/media/AzureML_SupervisedCV_Architecture.vsdx diff --git a/documentation/architecture/nlp.md b/documentation/architecture/nlp.md new file mode 100644 index 0000000..493300e --- /dev/null +++ b/documentation/architecture/nlp.md @@ -0,0 +1,46 @@ +# Azure Machine Learning Natural Language Processing Architecture + +The Azure Machine Learning Natural Language Processing Architecture is based on the Classical Machine Learning Architecture with some modifications particular to NLP scenarios. + +![Azure Machine Learning Natural Language Processing Architecture](media/AzureML_NLP_Classification_Architecture.png) + + +1. **Data Estate** + + This element illustrates the organization data estate and potential data sources and targets for a data science project. Data Engineers would be the primary owners of this element of the MLOps v2 lifecycle. The Azure data platforms in this diagram are neither exhaustive nor prescriptive. However, data sources and targets that represent recommended best practices based on customer use case are indicated by the green check + +2. **Administration & Setup** + + This element is the first step in the MLOps v2 Accelerator deployment. It consists of all tasks related to creation and management of resources and roles associated with the project. For NLP scenarios, Administration & Setup of the MLOps v2 environment is largely the same as for Classical Machine Learning with the addition of creation of Image Labeling and Annotation projects that can use the Labeling feature of Azure Machine Learning or other tools. + +3. **Model Development (Inner Loop)** + + The inner loop element consists of your iterative data science workflow performed within a dedicated, secure Azure Machine Learning Workspace. The typical NLP model development loop can be significantly different from the Classical Machine Learning scenario in that Annotators for Sentences and Tokenization, Normalization, and Embeddings for text data are the typical development steps for this scenario. + +4. **Azure Machine Learning Registries** + + When the Data Science team has developed a model that is a candidate for deploying to production, the model can be registered in the Azure Machine Learning workspace registry. Continuous Integration (CI) pipelines triggered either automatically by model registration and/or gated human-in-the-loop approval promote the model and any other model dependencies to the model Deployment phase. + +5. **Model Deployment (Outer Loop)** + + The Model Deployment or Outer Loop phase consists of pre-production staging and testing, production deployment, and monitoring of both model/data and infrastructure. Continuous Deployment (CD) pipelines manage the promotion of the model and related assets through production, monitoring, and potential retraining as criteria appropriate to your organization and use case are satisfied. + +6. **Staging & Test** + + The Staging & Test phase can vary with customer practices but typically includes operations such as retraining and testing of the model candidate on production data, test deployments for endpoint performance, data quality checks, unit testing, and Responsible AI checks for model and data bias. This phase takes place in one or more dedicated, secure Azure Machine Learning Workspaces. + +7. **Production Deployment** + + After a model passes the Staging & Test phase, the model can be promoted to production via a human-in-the-loop gated approvals. Model deployment options include a Batch Managed Endpoint for batch scenarios or, for online, near-realtime scenarios, either an Online Managed Endpoint or to Kubernetes using Azure Arc. Production typically takes place in one or more dedicated, secure Azure Machine Learning Workspaces. + +8. **Monitoring** + + Monitoring in staging/test and production enables you to collect and act on changes in performance of the model, data, and infrastructure. Model and data monitoring may include checking for model and data drift, model performance on new text data, and Responsible AI issues. Infrastructure monitoring can watch for issues with endpoint response time, problems with deployment compute capacity, or network issues. + +9. **Data & Model Monitoring - Events and Actions** + + As with the Computer Vision architecture, the Data & Model monitoring and event/action phase of MLOps for Natural Language Processing is the key difference from Classical Machine Learning. Automated retraining is typically not done in NLP scenarios when model performance degradation on new text is detected. In this case, new text data for which the model performs poorly must be reviewed and annotated by a human-in-the-loop and often the next action goes back to the Model Development loop for updating the model with the new text data. + +10. **Infrastructure Monitoring - Events and Actions** + + Based on criteria for infrastructure monitors of concern such as endpoint response lag or insufficient compute for the deployment, automated triggers and notifications can implement appropriate actions to take. This triggers a loop back to the Setup & Administration phase where the Infrastructure Team can investigate and potentially reconfigure environment compute and network resources. diff --git a/documentation/architecture/vision.md b/documentation/architecture/vision.md new file mode 100644 index 0000000..1b2d092 --- /dev/null +++ b/documentation/architecture/vision.md @@ -0,0 +1,46 @@ +# Azure Machine Learning Computer Vision Architecture + +The Azure Machine Learning Computer Vision Architecture is based on the Classical Machine Learning Architecture with some modifications particular to supervised CV scenarios. + +![Azure Machine Learning Computer Vision Architecture](media/AzureML_SupervisedCV_Architecture.png) + + +1. **Data Estate** + + This element illustrates the organization data estate and potential data sources and targets for a data science project. Data Engineers would be the primary owners of this element of the MLOps v2 lifecycle. The Azure data platforms in this diagram are neither exhaustive nor prescriptive. Images for Computer Vision scenarios may come from many different data sources. For efficiency when developing and deploying CV models with Azure Machine Learning, recommended Azure data sources for images are Azure Blob Storage and Azure Data Lake Storage. + +2. **Administration & Setup** + + This element is the first step in the MLOps v2 Accelerator deployment. It consists of all tasks related to creation and management of resources and roles associated with the project. For CV scenarios, Administration & Setup of the MLOps v2 environment is largely the same as for Classical Machine Learning with the addition of creation of Image Labeling and Annotation projects that can use the Labeling feature of Azure Machine Learning or other tools. + +3. **Model Development (Inner Loop)** + + The inner loop element consists of your iterative data science workflow performed within a dedicated, secure Azure Machine Learning Workspace. The primary difference between this workflow and the Classical Machine Learning scenario in that Image Labeling/Annotation is a key element of this development loop. + +4. **Azure Machine Learning Registries** + + When the Data Science team has developed a model that is a candidate for deploying to production, the model can be registered in the Azure Machine Learning workspace registry. Continuous Integration (CI) pipelines triggered either automatically by model registration and/or gated human-in-the-loop approval promote the model and any other model dependencies to the model Deployment phase. + +5. **Model Deployment (Outer Loop)** + + The Model Deployment or Outer Loop phase consists of pre-production staging and testing, production deployment, and monitoring of both model/data and infrastructure. Continuous Deployment (CD) pipelines manage the promotion of the model and related assets through production, monitoring, and potential retraining as criteria appropriate to your organization and use case are satisfied. + +6. **Staging & Test** + + The Staging & Test phase can vary with customer practices but typically includes operations such as test deployments for endpoint performance, data quality checks, unit testing, and Responsible AI checks for model and data bias. For CV scenarios, retraining of the model candidate on production data may not be done due to resource and time constraints. Rather the data science team may have access to production data for model development and the candidate model registered from the development loop is the "final" model to be evaluated for production. This phase takes place in one or more dedicated, secure Azure Machine Learning Workspaces. + +7. **Production Deployment** + + After a model passes the Staging & Test phase, the model can be promoted to production via human-in-the-loop gated approvals. Model deployment options include a Batch Managed Endpoint for batch scenarios or, for online, near-realtime scenarios, either an Online Managed Endpoint or to Kubernetes using Azure Arc. Production typically takes place in one or more dedicated, secure Azure Machine Learning Workspaces. + +8. **Monitoring** + + Monitoring in staging/test and production enables you to collect metrics for and act on changes in performance of the model, data, and infrastructure. Model and data monitoring may include checking for model performance on new images. Infrastructure monitoring can watch for issues with endpoint response time, problems with deployment compute capacity, or network issues. + +9. **Data & Model Monitoring - Events and Actions** + + The Data & Model monitoring and event/action phase of MLOps for Computer Vision is the key difference from Classical Machine Learning. Automated retraining is typically not done in CV scenarios when model performance degradation on new images is detected. In this case, new images for which the model performs poorly must be reviewed and annotated by a human-in-the-loop and often the next action goes back to the Model Development loop for updating the model with the new images. + +10. **Infrastructure Monitoring - Events and Actions** + + Based on criteria for infrastructure monitors of concern such as endpoint response lag or insufficient compute for the deployment, automated triggers and notifications can implement appropriate actions to take. This triggers a loop back to the Setup & Administration phase where the Infrastructure Team can investigate and potentially reconfigure environment compute and network resources. diff --git a/documentation/deployguides/README.md b/documentation/deployguides/README.md new file mode 100644 index 0000000..6b950a9 --- /dev/null +++ b/documentation/deployguides/README.md @@ -0,0 +1,8 @@ +# Getting Started + +## Deploying the Solution Accelerator + +The following guides walk you through deploying the solution accelerator using either Azure DevOps or GitHub Workflows for source code management and orchestration. + +* [Azure DevOps ](/documentation/quickstart/quickstart_ado.md) +* [Github Workflows](/documentation/quickstart/quickstart_gha.md) diff --git a/documentation/deployguides/deployguide_ado.md b/documentation/deployguides/deployguide_ado.md new file mode 100644 index 0000000..069139c --- /dev/null +++ b/documentation/deployguides/deployguide_ado.md @@ -0,0 +1,419 @@ +# Deployment Guide - Azure DevOps Repositories and Pipelines + +This document will guide you through deploying the MLOps V2 project generator and projects using only Azure DevOps to host source repositories and pipelines. + +**Requirements:** +- If using Terraform to create and manage infrastructure from Azure DevOps, install the [Terraform extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-devlabs.custom-terraform-tasks). +- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) with `azure-devops` extension. +- Azure subscription(s) based on if you are deploying Prod only or Prod and Dev environments +- Ability to create Azure service principals to access / create Azure resources from Azure DevOps +- Git bash, WSL, or another shell script editor on your local machine + + +## Setup MLOps V2 and a New MLOps Project in Azure DevOps +--- + +1.
+ Create Service Principals + For the use of the demo, the creation of one or two service principles is required, depending on how many environments, you want to work on (Dev or Prod or Both). These principles can be created using one of the methods below: +
+ Create from Azure Cloud Shell + 1.1 Launch the Azure Cloud Shell . (If this the first time you have launched the cloud shell, you will be required to create a storage account for the cloud shell.) + + 1.2 If prompted, choose **Bash** as the environment used in the Cloud Shell. You can also change environments in the drop-down on the top navigation bar + + ![PS_CLI_1](./images/PS_CLI1_1.png) + + 1.3 Copy the bash commands below to your computer and update the **projectName**, **subscriptionId**, and **environment** variables with the values for your project. If you are creating both a Dev and Prod environment you will need to run this script once for each environment, creating a service principal for each. This command will also grant the **Contributor** role to the service principal in the subscription provided. This is required for Azure DevOps to properly deploy resources to that subscription. + + ``` bash + projectName="" + roleName="Contributor" + subscriptionId="" + environment="" #First letter should be capitalized + servicePrincipalName="Azure-ARM-${environment}-${projectName}" + # Verify the ID of the active subscription + echo "Using subscription ID $subscriptionId" + echo "Creating SP for RBAC with name $servicePrincipalName, with role $roleName and in scopes /subscriptions/$subscriptionId" + az ad sp create-for-rbac --name $servicePrincipalName --role $roleName --scopes /subscriptions/$subscriptionId + echo "Please ensure that the information created here is properly save for future use." + ``` + + 1.4 Copy your edited commmands into the Azure Shell and run them (Ctrl + Shift + v). + + ![PS_CLI_1_4](./images/PS_CLI1_4.png) + + + 1.4 After running these commands you will be presented with information related to the service principal. Save this information to a safe location, it will be used later in the demo to configure Azure DevOps. + + ``` + { + "appId": "", + "displayName": "Azure-ARM-dev-Sample_Project_Name", + "password": "", + "tenant": "" + } + ``` + + 1.5 Repeat step 1.3 if you are creating service principals for Dev and Prod environments. + + 1.6 Close the Cloud Shell once the service principals are created. + +
+
+ Create from Azure Portal + 1.1. Navigate to Azure App Registrations + + 1.2. Select "new registration". + + ![PS2](./images/SP-setup2.png) + + 1.3. Go through the process of creating a Service Principle (SP) selecting "Accounts in any organizational directory (Any Azure AD directory - Multitenant)" and name it "Azure-ARM-Dev-ProjectName". Once created, repeat and create a new SP named "Azure-ARM-Prod-ProjectName". Please replace "ProjectName" with the name of your project so that the service principal can be uniquely identified. + + 1.4. Go to "Certificates & Secrets" and add for each SP "New client secret", then store the value and secret seperately. + + 1.5. To assign the necessary permissions to these principals, select your respective subscription and go to IAM. Select +Add then select "Add Role Assigment. + + ![PS3](./images/SP-setup3.png) + + 1.6. Select Contributor and add members selecting + Select Members. Add the member "Azure-ARM-Dev-ProjectName" as create before. + + ![SP4](./images/SP-setup4.png) + + 1.7. Repeat step here, if you deploy Dev and Prod into the same subscription, otherwise change to the prod subscription and repeat with "Azure-ARM-Prod-ProjectName". The basic SP setup is successfully finished. +
+
+ +2.
+ Set up Azure DevOps + + ### Requirements: + - An Organization in Azure DevOps (Create your Organization) + + 2.1. Navigate to [Azure DevOps](https://go.microsoft.com/fwlink/?LinkId=2014676&githubsi=true&clcid=0x409&WebUserId=2ecdcbf9a1ae497d934540f4edce2b7d). + + 2.2. Create a new project. + + ![ADO Project](./images/ado-create-project.png) + + 2.3. In the project under **Project Settings** (at the bottom left of the project page) select **Service Connections**. + + ![ADO1](./images/ADO-setup1.png) + + **Azure Subscription Connection:** + + 2.3.1 Select "New Service Connection". + + ![ADO2](./images/ADO-setup2.png) + + 3.3.2 Select "Azure Resource Manager", select "Next", select "Service principal (manual)", select "Next", select your subscrption where your Service Principal is stored and name the service connection "Azure-ARM-Dev". Fill in the details of the Dev service principal created in step 1. Select "Grant access permission to all pipelines", then select "Save". Repeat this step to create another service connection "Azure-ARM-Prod" using the details of the Prod service principal created in step 1. + + ![ADO3](./images/ado-service-principal-manual.png) + + The Azure DevOps setup is successfully finished. +
+3.
+ Set up source repository with Azure DevOps + + 3.1 Open the project you created in [Azure DevOps](https://dev.azure.com/) + + 3.2 Open the Repos section. Click on the default repo name at the top of the screen and select **Import Repository** + + ![image](./images/ado-import-repo.png) + + 3.3 Enter https://github.com/Azure/mlops-templates into the Clone URL field. Click import at the bottem of the page + + ![image](./images/ado-import-mlops-templates.png) + + 3.4 Open the Repos section again and import the following repositories: + - https://github.com/Azure/mlops-project-template + - https://github.com/Azure/mlops-v2 + + + 3.5.1 Open the Repos section. Click on the default repo name at the top of the screen and select **New Repository** + + ![image](./images/ado-add-demoproject.png) + + 3.5.2 Enter a name for the repository. This will be used to store the files for the project type you choose. Click **Create** + + ![image](./images/ado-create-demoprojectrepo.png) + + 3.5.3 Open the **project settings** at the bottom of the left hand navigation pane + + ![image](./images/ado-open-projectSettings.png) + + 3.5.4 Under the Repos section, click **Repositories**. Select the repository you created in step 3.5.2. Select the **Security** tab + + 3.5.5 Under the User permissions section, select the \ Build Service user + + 3.5.6 Change the permissions for **Contribute** and **Create branch** to **Allow** + ![image](./images/ado-permissions-repo.png) + + + + 3.6 Open the Pipelines section and click on the 3 vertical dots next to the **Create Pipelines** button. Select **Manage Security** + + ![image](./images/ado-open-pipelinesSecurity.png) + + 3.6.1 Select the \ Build Service account for your project under the Users section. Change the permission **Edit build pipeline** to **Allow** + + ![image](./images/ado-add-pipelinesSecurity.png) + + + 3.5 Open the Pipelines section and create a new pipeline + + ![image](./images/ado-pipeline-sparsecheckout.png) + + 3.6 + - Select Azure Repos Git + - Select the mlops-v2 repository + - Select existing Azure Pipelines YAML file + - Ensure the selected branch is **main** + - Select the /.azuredevops/initialise-project.yml file in the patch drop-down + - Click Continue + + On the pipeline review page chose to **save** the pipeline before running it. + + ![image](./images/ado-save-sparepipeline.png) + + 3.7 Click run pipeline + + ![image](./images/ado-run-sparepipeline.png) + + You will need to complete the required parameters to configure your project + + ![image](./images/ado-parameters-sparepipeline.png) + + - **ADO project name** : This is the name of the Azure DevOps project you created + - **Project repo name**: This is the name of the mlops-v2 accelerator project you imported from GitHub (Default is **mlops-v2** unless you changed it during import) + - **MLOps Project Template name**: Name of the shared templates you imported previously (Default is **mlops-project-template**) + - ML Project type: + - Choose **classical** for a regression or classification project. + - Choose **cv** for a computer vision project + - Choose **nlp** for natural language projects + - MLOps version + - choose **python-sdk** to use the python SDK for training and deployment of your model + - Choose **aml-cli-v2** to yse the cli tools for training and deployment of your model + - Infrastructure Version: + - Choose **Bicep** to deploy using Azure ARM based templates + - Choose **terraform** to use terraform based templates. + + 3.8.1 The first run of the pipeline will require you to grant access to the repositories you created. Click **View** + + ![ADO_Pipeline_permissions](./images/ado-pipeline-permissions.png) + + 3.8.2 Click **Permit** for all repositories waiting for review + + ![ADO_Pipeline_permissionsReview](./images/ado-pipeline-permissionsPermit.png) + + 3.9 The pipeline will run the following actions: + - Your project repository will be populated with the files needed to create the Azure Machine Learning project and resources. + ![ADO_view_repoSparseCheckout](./images/ado-view-repoSparseCheckout.png) + - Pipelines for the creation of infrastructure and the training and deployment of machine learning models. + ![ADO_view_allPipelines](./images/ado-view-allPipelines.png) + + 3.10 Under Pipelines, select Environments and ensure both "Prod" and "Dev" environments are created. Create the "Dev" environment manually, if necessary. + + **This finishes the prerequisite section and the deployment of the solution accelerator can happen accordingly.** +
+ + +## Outer Loop: Deploying Infrastructure via Azure DevOps +--- +This step deploys the training pipeline to the Azure Machine Learning workspace created in the previous steps. + +
+ Run Azure Infrastructure pipeline + 1. Go to your Github cloned repo and select the "config-infra-prod.yml" file. + + ![ADO Run4](./images/ADO-run4.png) + + Under global, there's two values namespace and postfix. These values should render the names of the artifacts to create unique. Especially the name for the storage account, which has the most rigid constraints, like uniqueness Azure wide and 3-5 lowercase characters and numbers. So please change namespace and/or postfix to a value of your liking and remember to stay within the contraints of a storage account name as mentioned above. Then save, commit, push, pr to get these values into the pipeline. + + If your are running a Deep Learning workload such as CV or NLP, you have to ensure your GPU compute is availible in your deployment zone. Please replace as shown above your location to eastus. Example: + + namespace: [5 max random new letters] + postfix: [4 max random new digits] + location: eastus + + Please repeat this step for "config-infra-dev.yml" and "config-infra-prod.yml"! + + 2. Go to ADO pipelines + + ![ADO Pipelines](./images/ADO-pipelines.png) + + 3. Select "New Pipeline". + + ![ADO Run1](./images/ADO-run1.png) + + 4. Select "Azure Repos Git". + + ![ADO Where's your code](./images/ado-wheresyourcode.png) + + 5. Select your /MLOps-Test repository. + + ![ADO Run2](./images/ADO-run2.png) + + If your new repository is not visible, then click on the "provide access" link and on the next screen, click on the "grant" button next to the organization name to grant access to your organization. + + 6. Select "Existing Azure Pipeline YAML File" + + ![ADO Run3](./images/ADO-run3.png) + + + 7. Select "main" as a branch and choose based on your deployment method your preferred yml path. For a terraform schenario choose: 'infrastructure/pipelines/tf-ado-deploy-infra.yml', then select "Continue". For a bicep schenario choose: 'infrastructure/pipelines/bicep-ado-deploy-infra.yml', then select "Continue". + + ![Select Infrastructure Pipeline](./images/ado-select-pipeline-yaml-file.png) + + + + 8. Run the pipeline. This will take a few minutes to finish. The pipeline should create the following artifacts: + * Resource Group for your Workspace including Storage Account, Container Registry, Application Insights, Keyvault and the Azure Machine Learning Workspace itself. + * In the workspace there's also a compute cluster created. + + ![ADO Run5](./images/ADO-run5.png) + + Now the Outer Loop of the MLOps Architecture is deployed. + + ![ADO Run6](./images/ADO-run-infra-pipeline.png) + +> Note: the "Unable move and reuse existing repository to required location" warnings may be ignored. +
+ +> + +## Inner Loop: Deploying Classical ML Model Development / Moving to Test Environment - Azure DevOps +--- +
+ Deploy Classical ML Model + 1. Go to ADO pipelines + + ![ADO Pipelines](./images/ADO-pipelines.png) + + 2. Select "New Pipeline". + + ![ADO Run1](./images/ADO-run1.png) + + 3. Select "Azure Repos Git". + + ![ADO Where's your code](./images/ado-wheresyourcode.png) + + 4. Select your /MLOps-Test repository + + ![ADO Run2](./images/ADO-run2.png) + + 5. Select "Existing Azure Pipeline YAML File" + + ![ADO Run3](./images/ADO-run3.png) + + 6. Select "main" as a branch and choose '/mlops/devops-pipelines/deploy-model-training-pipeline.yml', then select "Continue". + + ![ADO Run9](./images/ADO-run9.png) + + 7. Before running the pipeline, the repository location for the mlops-templates will need to be updated. Modify the **resources** section of the pipeline to match the image below + + ![resourceRepoADO](./images/ado-pipeline-resourcesRepoADO.png) + +
+ + +## Inner Loop: Checkpoint + + At this point, the infrastructure is configured and the Inner Loop of the MLOps Architecture is deployed. We are ready to move to our trained model to production. + + +## Inner / Outer Loop: Moving to Production - Introduction +--- + + >**NOTE: This is an end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs.** + + >**Prepare Data + This component takes multiple taxi datasets (yellow and green) and merges/filters the data, and prepare the train/val and evaluation datasets. + Input: Local data under ./data/ (multiple .csv files) + Output: Single prepared dataset (.csv) and train/val/test datasets.** + + >**Train Model + This component trains a Linear Regressor with the training set. + Input: Training dataset + Output: Trained model (pickle format)** + + >**Evaluate Model + This component uses the trained model to predict taxi fares on the test set. + Input: ML model and Test dataset + Output: Performance of model and a deploy flag whether to deploy or not. + This component compares the performance of the model with all previous deployed models on the new test dataset and decides whether to promote or not model into production. Promoting model into production happens by registering the model in AML workspace.** + + >**Register Model + This component scores the model based on how accurate the predictions are in the test set. + Input: Trained model and the deploy flag. + Output: Registered model in Azure Machine Learning.** + + +## Inner / Outer Loop: Moving to Production - Azure DevOps +--- +
+ Deploy ML model endpoint + 1. Go to ADO pipelines + + ![ADO Pipelines](./images/ADO-pipelines.png) + + 2. Select "New Pipeline". + + ![ADO Run1](./images/ADO-run1.png) + + 3. Select "Azure Repos Git". + + ![ADO Where's your code](./images/ado-wheresyourcode.png) + + 4. Select your /MLOps-Test repository! ("Empty" repository you created in 2.3) + + ![ADO Run2](./images/ADO-run2.png) + + 5. Select "Existing Azure Pipeline YAML File" + + ![ADO Run3](./images/ADO-run3.png) + + 6. Select "main" as a branch and choose: + For Classical Machine Learning: + Managed Batch Endpoint '/mlops/devops-pipelines/deploy-batch-endpoint-pipeline.yml' + Managed Online Endpoint '/mlops/devops-pipelines/deploy-online-endpoint-pipeline.yml' + For Computer Vision: + Managed Online Endpoint '/mlops/devops-pipelines/deploy-batch-endpoint-pipeline.yml' + + Then select "Continue". + + ![ADO Run10](./images/ADO-run10.png) + + 7. The resource repository will need to be modified to request the correct repository from your project. Modify the Repository section as shown below + + ![resourceRepoADO](./images/ado-pipeline-resourcesRepoADO.png) + + 8. Batch/Online endpoint names need to be unique, so please change [your endpointname] to another unique name and then select "Run". + + ![ADO Run11](./images/ADO-batch-pipeline.png) + + **IMPORTANT: If the run fails due to an existing online endpoint name, recreate the pipeline as discribed above and change [your endpointname] to [your endpointname [random number]]"** + + 9. When the run completes, you will see: + + ![ADO Run12](./images/ADO-batch-pipeline-run.png) + + Now the Inner Loop is connected to the Outer of the MLOps Architecture and inference has been run. +
+ + + +## Next Steps +--- + +This finishes the demo according to the architectual pattern: Azure Machine Learning Classical Machine Learning. Next you can dive into your Azure Machine Learning service in the Azure Portal and see the inference results of this example model. + +As elements of Azure Machine Learning are still in development, the following components are not part of this demo: +- Secure Workspaces +- Model Monitoring for Data/Model Drift +- Automated Retraining +- Model and Infrastructure triggers + +Interim it is recommended to schedule the deployment pipeline for development for complete model retraining on a timed trigger. + +For questions, please [submit an issue](https://github.com/Azure/mlops-v2/issues) or reach out to the development team at Microsoft. \ No newline at end of file diff --git a/documentation/deployguides/deployguide_gha.md b/documentation/deployguides/deployguide_gha.md new file mode 100644 index 0000000..4c5eac9 --- /dev/null +++ b/documentation/deployguides/deployguide_gha.md @@ -0,0 +1,296 @@ +# Deployment Guide using Github Repositories Workflows + +## Technical requirements + +- Github as the source control repository +- Github Actions as the DevOps orchestration tool +- [GitHub client](https://cli.github.com/) +- [Azure CLI ](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) +- The [Terraform extension for Azure DevOps](https://marketplace.visualstudio.com/items?itemName=ms-devlabs.custom-terraform-tasks) if you are using Azure DevOps + Terraform to spin up infrastructure +- Azure service principals to access / create Azure resources from Azure DevOps or Github Actions (or the ability to create them) +- Git bash, WSL or another shell script editor on your local machine + +>**Note:** +> +>**Git version 2.27 or newer is required. See [these instructions](https://github.com/cli/cli/blob/trunk/docs/install_linux.md#debian-ubuntu-linux-raspberry-pi-os-apt) to upgrade.** + + +## Configure The GitHub Environment +--- + + +1. **Replicate MLOps-V2 Template Repositories in your GitHub organization** + Go to https://github.com/Azure/mlops-templates/fork to fork the mlops templates repo into your Github org. This repo has reusable mlops code that can be used across multiple projects. + + ![image](./images/gh-fork.png) + + Go to https://github.com/Azure/mlops-project-template/generate to create a repository in your Github org using the mlops-project-template. This is the monorepo that you will use to pull example projects from in a later step. + + ![image](./images/gh-generate.png) + +2. **Clone the mlops-v2 repository to local system** + On your local machine, select or create a root directory (ex: 'mlprojects') to hold your project repository as well as the mlops-v2 repository. Change to this directory. + + Clone the mlops-v2 repository to this directory. This provides the documentation and the `sparse_checkout.sh` script. This repository and folder will be used to bootstrap your projects: + `# git clone https://github.com/Azure/mlops-v2.git` + +3. **Configure and run sparse checkout** + From your local project root directory, open the `/mlops-v2/sparse_checkout.sh` for editing. Edit the following variables as needed to select the infastructure management tool used by your organization, the type of Open this file in an editor and set the following variables: + + * **infrastructure_version** selects the tool that will be used to deploy cloud resources. + * **project_type** selects the AI workload type for your project (classical ml, computer vision, or nlp) + * **mlops_version** selects your preferred interaction approach with Azure Machine Learning + * **git_folder_location** points to the root project directory to which you cloned mlops-v2 in step 3 + * **project_name** is the name (case sensitive) of your project. A GitHub repository will be created with this name + * **github_org_name** is your GitHub organization + * **project_template_github_url** is the URL to the original or your generated clone of the mlops_project_template repository from step 1 + * **orchestration** specifies the CI/CD orchestration to use +

+ A sparse_checkout.sh example is below: + + ```bash + #options: terraform / bicep + infrastructure_version=terraform + + #options: classical / cv / nlp + project_type=classical + + #options: python-sdk / aml-cli-v2 + mlops_version=aml-cli-v2 + + #replace with the local root folder location where you want + git_folder_location='/home//mlprojects' + + #replace with your project name + project_name=taxi-fare-regression + + #replace with your github org name + github_org_name= + + #replace with the url for the project template for your organization created in step 2.2 + project_template_github_url=https://github.com/azure/mlops-project-template + + #options: github-actions / azure-devops + orchestration=github-actions + ``` + Currently, classical, cv (computer-vision), and nlp (natural language processing) pipelines are supported. + +4. **Run sparse checkout** + The `sparse_checkout.sh` script will use ssh to authenticate to your GitHub organization. If this is not yet configured in your environment, follow the steps below or refer to the documentation at [GitHub Key Setup](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent). + + > **GitHub Key Setup** + > + > On your local machine, create a new ssh key: + > `# ssh-keygen -t ed25519 -C ""` + > You may press enter to all three prompts to create a new key in `/home//.ssh/id_ed25519` + > + > Add your SSH key to your SSH agent: + > `# eval "$(ssh-agent -s)" ` + > `# ssh-add ~/.ssh/id_ed25519` + > + > Get your public key to add to GitHub: + > `# cat ~/.ssh/id_ed25519.pub` + > It will be a string of the format '`ssh-ed25519 ... your_email@example.com`'. Copy this string. + > + > [Add your SSH key to Github](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account). Under your account menu, select "Settings", then "SSH and GPG Keys". Select "New SSH key" and enter a title. Paste your public key into the key box and click "Add SSH key" + + From your root project directory (ex: mlprojects/), execute the `sparse_checkout.sh` script: + > `# bash mlops-v2/sparse_checkout.sh` + + This will run the script, using git sparse checkout to build a local copy of your project repository based on your choices configured in the script. It will then create the GitHub repository and push the project code to it. + + Monitor the script execution for any errors. If there are errors, you can safely remove the local copy of the repository (ex: taxi_fare_regression/) as well as delete the GitHub project repository. After addressing the errors, run the script again. + + After the script runs successfully, the GitHub project will be initialized with your project files. + +5. **Configure GitHub Actions Secrets** + + This step creates a service principal and GitHub secrets to allow the GitHub action workflows to create and interact with Azure Machine Learning Workspace resources. + + From the command line, execute the following Azure CLI command with your choice of a service principal name: + > `# az ad sp create-for-rbac --name --role contributor --scopes /subscriptions/ --sdk-auth` + + You will get output similar to below: + + >`{` + > `"clientId": "",` + > `"clientSecret": "",` + > `"subscriptionId": "",` + > `"tenantId": "",` + > `"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",` + > `"resourceManagerEndpointUrl": "https://management.azure.com/",` + > `"activeDirectoryGraphResourceId": "https://graph.windows.net/",` + > `"sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",` + > `"galleryEndpointUrl": "https://gallery.azure.com/",` + > `"managementEndpointUrl": "https://management.core.windows.net/"` + > `}` + + Copy all of this output, braces included. + + From your GitHub project, select **Settings**: + + ![GitHub Settings](./images/gh-settings.png) + + Then select **Secrets**, then **Actions**: + + ![GitHub Secrets](./images/gh-secrets.png) + + Select **New repository secret**. Name this secret **AZURE_CREDENTIALS** and paste the service principal output as the content of the secret. Select **Add secret**. + + > **Note:** + > If deploying the infrastructure using terraform, add the following additional GitHub secrets using the corresponding values from the service principal output as the content of the secret: + > + > **ARM_CLIENT_ID** + > **ARM_CLIENT_SECRET** + > **ARM_SUBSCRIPTION_ID** + > **ARM_TENANT_ID** + + The GitHub configuration is complete. + +## Deploy Machine Learning Project Infrastructure Using GitHub Actions + +1. **Configure Azure ML Environment Parameters** + + In your Github project repository (ex: taxi-fare-regression), there are two configuration files in the root, `config-infra-dev.yml` and `config-infra-prod.yml`. These files are used to define and deploy Dev and Prod Azure Machine Learning environments. With the default deployment, `config-infra-prod.yml` will be used when working with the main branch or your project and `config-infra-dev.yml` will be used when working with any non-main branch. + + It is recommended to first create a dev branch from main and deploy this environment first. + + Edit each file to configure a namespace, postfix string, Azure location, and environment for deploying your Dev and Prod Azure ML environments. Default values and settings in the files are show below: + + > ```bash + > namespace: mlopsv2 #Note: A namespace with many characters will cause storage account creation to fail due to storage account names having a limit of 24 characters. + > postfix: 0001 + > location: eastus + > environment: dev + > enable_aml_computecluster: true + > enable_monitoring: false + >``` + + The first four values are used to create globally unique names for your Azure environment and contained resources. Edit these values to your liking then save, commit, push, or pr to update these files in the project repository. + + If you are running a Deep Learning workload such as CV or NLP, ensure your subscription and Azure location has available GPU compute. + + > Note: + > + > The enable_monitoring flag in these files defaults to False. Enabling this flag will add additional elements to the deployment to support Azure ML monitoring based on https://github.com/microsoft/AzureML-Observability. This will include an ADX cluster and increase the deployment time and cost of the MLOps solution. + +2. **Deploy Azure Machine Learning Infrastructure** + + In your GitHub project repository (ex: taxi-fare-regression), select **Actions** + + ![GH-actions](./images/gh-actions.png) + + This will display the pre-defined GitHub workflows associated with your project. For a classical machine learning project, the available workflows will look similar to this: + + ![GH-workflows](./images/gh-workflows.png) + + Depending on the the use case, available workflows may vary. Select the workflow to 'deploy-infra'. In this scenario, the workflow to select would be **tf-gha-deploy-infra.yml**. This would deploy the Azure ML infrastructure using GitHub Actions and Terraform. + + ![GH-deploy-infra](./images/gh-deploy-infra.png) + + On the right side of the page, select **Run workflow** and select the branch to run the workflow on. This may deploy Dev Infrastructure if you've created a dev branch or Prod infrastructure if deploying from main. Monitor the pipline for successful completion. + + ![GH-infra-pipeline](./images/gh-infra-pipeline.png) + + When the pipline has complete successfully, you can find your Azure ML Workspace and associated resources by logging in to the Azure Portal. + + Next, a model training and scoring pipelines will be deployed into the new Azure Machine Learning environment. + +## Sample Training and Deployment Scenario + +The solution accelerator includes code and data for a sample end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs. Sample pipelines and workflows for the Computer Vision and NLP scenarios will have different steps and deployment steps. + +This training pipeline contains the following steps: + +**Prepare Data** +This component takes multiple taxi datasets (yellow and green) and merges/filters the data, and prepare the train/val and evaluation datasets. +Input: Local data under ./data/ (multiple .csv files) +Output: Single prepared dataset (.csv) and train/val/test datasets. + +**Train Model** +This component trains a Linear Regressor with the training set. +Input: Training dataset +Output: Trained model (pickle format) + +**Evaluate Model** + This component uses the trained model to predict taxi fares on the test set. + Input: ML model and Test dataset + Output: Performance of model and a deploy flag whether to deploy or not. + This component compares the performance of the model with all previous deployed models on the new test dataset and decides whether to promote or not model into production. Promoting model into production happens by registering the model in AML workspace. + +**Register Model** + This component scores the model based on how accurate the predictions are in the test set. + Input: Trained model and the deploy flag. + Output: Registered model in Azure Machine Learning. + +## Deploying the Model Training Pipeline to the Test Environment + +Next, you will deploy the model training pipeline to your new Azure Machine Learning workspace. This pipeline will create a compute cluster instance, register a training environment defining the necessary Docker image and python packages, register a training dataset, then start the training pipeline described in the last section. When the job is complete, the trained model will be registered in the Azure ML workspace and be available for deployment. + +In your GitHub project repository (ex: taxi-fare-regression), select **Actions** + + ![GH-actions](./images/gh-actions.png) + +Select the **deploy-model-training-pipeline** from the workflows listed on the left and the click **Run Workflow** to execute the model training workflow. This will take several minutes to run, depending on the compute size. + + ![Pipeline Run](./images/gh-training-pipeline.png) + + Once completed, a successful run will register the model in the Azure Machine Learning workspace. + + >**Note**: If you want to check the output of each individual step, for example to view output of a failed run, click a job output, and then click each step in the job to view any output of that step. + + ![Training Step](./images/gh-training-step.png) + +With the trained model registered in the Azure Machine learning workspace, you are ready to deploy the model for scoring. + +## Deploying the Trained Model in Dev + +This scenario includes prebuilt workflows for two approaches to deploying a trained model, batch scoring or a deploying a model to an endpoint for real-time scoring. You may run either or both of these workflows in your dev branch to test the performance of the model in your Dev Azure ML workspace. + +In your GitHub project repository (ex: taxi-fare-regression), select **Actions** + + ![GH-actions](./images/gh-actions.png) + + ### Online Endpoint + +Select the **deploy-online-endpoint-pipeline** from the workflows listed on the left and click **Run workflow** to execute the online endpoint deployment pipeline workflow. The steps in this pipeline will create an online endpoint in your Azure Machine Learning workspace, create a deployment of your model to this endpoint, then allocate traffic to the endpoint. + + ![gh online endpoint](./images/gh-online-endpoint.png) + + Once completed, you will find the online endpoint deployed in the Azure ML workspace and available for testing. + + ![aml-taxi-oep](./images/aml-taxi-oep.png) + +### Batch Endpoint + +Select the **deploy-batch-endpoint-pipeline** from the workflows and click **Run workflow** to execute the batch endpoint deployment pipeline workflow. The steps in this pipeline will create a new AmlCompute cluster on which to execute batch scoring, create the batch endpoint in your Azure Machine Learning workspace, then create a deployment of your model to this endpoint. + +![gh batch endpoint](./images/gh-batch-endpoint.png) + +Once completed, you will find the batch endpoint deployed in the Azure ML workspace and available for testing. + +![aml-taxi-bep](./images/aml-taxi-bep.png) + + +## Moving to Production + +Example scenarios can be trained and deployed both for Dev and Prod branches and environments. When you are satisfied with the performance of the model training pipeline, model, and deployment in Testing, Dev pipelines and models can be replicated and deployed in the Production environment. + +The sample training and deployment Azure ML pipelines and GitHub workflows can be used as a starting point to adapt your own modeling code and data. + + +## Next Steps +--- + +This finishes the demo according to the architectual pattern: Azure Machine Learning Classical Machine Learning. Next you can dive into your Azure Machine Learning service in the Azure Portal and see the inference results of this example model. + +As elements of Azure Machine Learning are still in development, the following components are not part of this demo: +- Model and pipeline promotion from Dev to Prod +- Secure Workspaces +- Model Monitoring for Data/Model Drift +- Automated Retraining +- Model and Infrastructure triggers + +Interim it is recommended to schedule the deployment pipeline for development for complete model retraining on a timed trigger. + +For questions, please [submit an issue](https://github.com/Azure/mlops-v2/issues) or reach out to the development team at Microsoft. \ No newline at end of file diff --git a/images/ADO-Infrapipelinesuccess.png b/documentation/deployguides/images/ADO-Infrapipelinesuccess.png similarity index 100% rename from images/ADO-Infrapipelinesuccess.png rename to documentation/deployguides/images/ADO-Infrapipelinesuccess.png diff --git a/images/ADO-batch-pipeline-run.png b/documentation/deployguides/images/ADO-batch-pipeline-run.png similarity index 100% rename from images/ADO-batch-pipeline-run.png rename to documentation/deployguides/images/ADO-batch-pipeline-run.png diff --git a/images/ADO-batch-pipeline.png b/documentation/deployguides/images/ADO-batch-pipeline.png similarity index 100% rename from images/ADO-batch-pipeline.png rename to documentation/deployguides/images/ADO-batch-pipeline.png diff --git a/images/ADO-newpipeline.png b/documentation/deployguides/images/ADO-newpipeline.png similarity index 100% rename from images/ADO-newpipeline.png rename to documentation/deployguides/images/ADO-newpipeline.png diff --git a/images/ADO-pipelines.png b/documentation/deployguides/images/ADO-pipelines.png similarity index 100% rename from images/ADO-pipelines.png rename to documentation/deployguides/images/ADO-pipelines.png diff --git a/images/ADO-project.png b/documentation/deployguides/images/ADO-project.png similarity index 100% rename from images/ADO-project.png rename to documentation/deployguides/images/ADO-project.png diff --git a/images/ADO-run-infra-pipeline.png b/documentation/deployguides/images/ADO-run-infra-pipeline.png similarity index 100% rename from images/ADO-run-infra-pipeline.png rename to documentation/deployguides/images/ADO-run-infra-pipeline.png diff --git a/images/ADO-run1.png b/documentation/deployguides/images/ADO-run1.png similarity index 100% rename from images/ADO-run1.png rename to documentation/deployguides/images/ADO-run1.png diff --git a/images/ADO-run10.png b/documentation/deployguides/images/ADO-run10.png similarity index 100% rename from images/ADO-run10.png rename to documentation/deployguides/images/ADO-run10.png diff --git a/images/ADO-run11.png b/documentation/deployguides/images/ADO-run11.png similarity index 100% rename from images/ADO-run11.png rename to documentation/deployguides/images/ADO-run11.png diff --git a/images/ADO-run12.png b/documentation/deployguides/images/ADO-run12.png similarity index 100% rename from images/ADO-run12.png rename to documentation/deployguides/images/ADO-run12.png diff --git a/images/ADO-run2.png b/documentation/deployguides/images/ADO-run2.png similarity index 100% rename from images/ADO-run2.png rename to documentation/deployguides/images/ADO-run2.png diff --git a/images/ADO-run3.png b/documentation/deployguides/images/ADO-run3.png similarity index 100% rename from images/ADO-run3.png rename to documentation/deployguides/images/ADO-run3.png diff --git a/images/ADO-run4.png b/documentation/deployguides/images/ADO-run4.png similarity index 100% rename from images/ADO-run4.png rename to documentation/deployguides/images/ADO-run4.png diff --git a/images/ADO-run5.png b/documentation/deployguides/images/ADO-run5.png similarity index 100% rename from images/ADO-run5.png rename to documentation/deployguides/images/ADO-run5.png diff --git a/images/ADO-run6.png b/documentation/deployguides/images/ADO-run6.png similarity index 100% rename from images/ADO-run6.png rename to documentation/deployguides/images/ADO-run6.png diff --git a/images/ADO-run7.png b/documentation/deployguides/images/ADO-run7.png similarity index 100% rename from images/ADO-run7.png rename to documentation/deployguides/images/ADO-run7.png diff --git a/images/ADO-run8.png b/documentation/deployguides/images/ADO-run8.png similarity index 100% rename from images/ADO-run8.png rename to documentation/deployguides/images/ADO-run8.png diff --git a/images/ADO-run9.png b/documentation/deployguides/images/ADO-run9.png similarity index 100% rename from images/ADO-run9.png rename to documentation/deployguides/images/ADO-run9.png diff --git a/images/ADO-selectinfrapipeline.png b/documentation/deployguides/images/ADO-selectinfrapipeline.png similarity index 100% rename from images/ADO-selectinfrapipeline.png rename to documentation/deployguides/images/ADO-selectinfrapipeline.png diff --git a/images/ADO-setup1.png b/documentation/deployguides/images/ADO-setup1.png similarity index 100% rename from images/ADO-setup1.png rename to documentation/deployguides/images/ADO-setup1.png diff --git a/images/ADO-setup2.png b/documentation/deployguides/images/ADO-setup2.png similarity index 100% rename from images/ADO-setup2.png rename to documentation/deployguides/images/ADO-setup2.png diff --git a/images/ADO-setup3.png b/documentation/deployguides/images/ADO-setup3.png similarity index 100% rename from images/ADO-setup3.png rename to documentation/deployguides/images/ADO-setup3.png diff --git a/images/ADO-setup4.png b/documentation/deployguides/images/ADO-setup4.png similarity index 100% rename from images/ADO-setup4.png rename to documentation/deployguides/images/ADO-setup4.png diff --git a/images/ADO-setup5.png b/documentation/deployguides/images/ADO-setup5.png similarity index 100% rename from images/ADO-setup5.png rename to documentation/deployguides/images/ADO-setup5.png diff --git a/images/GH-setup1.png b/documentation/deployguides/images/GH-setup1.png similarity index 100% rename from images/GH-setup1.png rename to documentation/deployguides/images/GH-setup1.png diff --git a/images/GH-setup2.png b/documentation/deployguides/images/GH-setup2.png similarity index 100% rename from images/GH-setup2.png rename to documentation/deployguides/images/GH-setup2.png diff --git a/images/GH-setup3.png b/documentation/deployguides/images/GH-setup3.png similarity index 100% rename from images/GH-setup3.png rename to documentation/deployguides/images/GH-setup3.png diff --git a/images/GHATab.png b/documentation/deployguides/images/GHATab.png similarity index 100% rename from images/GHATab.png rename to documentation/deployguides/images/GHATab.png diff --git a/documentation/deployguides/images/PS_CLI1_1.png b/documentation/deployguides/images/PS_CLI1_1.png new file mode 100644 index 0000000..6beb4b1 Binary files /dev/null and b/documentation/deployguides/images/PS_CLI1_1.png differ diff --git a/documentation/deployguides/images/PS_CLI1_4.png b/documentation/deployguides/images/PS_CLI1_4.png new file mode 100644 index 0000000..438ba2c Binary files /dev/null and b/documentation/deployguides/images/PS_CLI1_4.png differ diff --git a/images/PipelineRun.png b/documentation/deployguides/images/PipelineRun.png similarity index 100% rename from images/PipelineRun.png rename to documentation/deployguides/images/PipelineRun.png diff --git a/images/SP-setup1.png b/documentation/deployguides/images/SP-setup1.png similarity index 100% rename from images/SP-setup1.png rename to documentation/deployguides/images/SP-setup1.png diff --git a/images/SP-setup2.png b/documentation/deployguides/images/SP-setup2.png similarity index 100% rename from images/SP-setup2.png rename to documentation/deployguides/images/SP-setup2.png diff --git a/images/SP-setup3.png b/documentation/deployguides/images/SP-setup3.png similarity index 100% rename from images/SP-setup3.png rename to documentation/deployguides/images/SP-setup3.png diff --git a/images/SP-setup4.png b/documentation/deployguides/images/SP-setup4.png similarity index 100% rename from images/SP-setup4.png rename to documentation/deployguides/images/SP-setup4.png diff --git a/documentation/deployguides/images/ado-add-demoproject.png b/documentation/deployguides/images/ado-add-demoproject.png new file mode 100644 index 0000000..c1099d4 Binary files /dev/null and b/documentation/deployguides/images/ado-add-demoproject.png differ diff --git a/documentation/deployguides/images/ado-add-pipelinesSecurity.png b/documentation/deployguides/images/ado-add-pipelinesSecurity.png new file mode 100644 index 0000000..fb7f0ae Binary files /dev/null and b/documentation/deployguides/images/ado-add-pipelinesSecurity.png differ diff --git a/images/ado-chooserepository.png b/documentation/deployguides/images/ado-chooserepository.png similarity index 100% rename from images/ado-chooserepository.png rename to documentation/deployguides/images/ado-chooserepository.png diff --git a/images/ado-configureyourpipeline.png b/documentation/deployguides/images/ado-configureyourpipeline.png similarity index 100% rename from images/ado-configureyourpipeline.png rename to documentation/deployguides/images/ado-configureyourpipeline.png diff --git a/documentation/deployguides/images/ado-create-demoprojectrepo.png b/documentation/deployguides/images/ado-create-demoprojectrepo.png new file mode 100644 index 0000000..af6f052 Binary files /dev/null and b/documentation/deployguides/images/ado-create-demoprojectrepo.png differ diff --git a/documentation/deployguides/images/ado-create-project.png b/documentation/deployguides/images/ado-create-project.png new file mode 100644 index 0000000..4f5d431 Binary files /dev/null and b/documentation/deployguides/images/ado-create-project.png differ diff --git a/images/ado-ghserviceconnection.png b/documentation/deployguides/images/ado-ghserviceconnection.png similarity index 100% rename from images/ado-ghserviceconnection.png rename to documentation/deployguides/images/ado-ghserviceconnection.png diff --git a/documentation/deployguides/images/ado-import-mlops-templates.png b/documentation/deployguides/images/ado-import-mlops-templates.png new file mode 100644 index 0000000..1b8d629 Binary files /dev/null and b/documentation/deployguides/images/ado-import-mlops-templates.png differ diff --git a/documentation/deployguides/images/ado-import-repo.png b/documentation/deployguides/images/ado-import-repo.png new file mode 100644 index 0000000..d4d7e70 Binary files /dev/null and b/documentation/deployguides/images/ado-import-repo.png differ diff --git a/images/ado-install-azure-pipelines.png b/documentation/deployguides/images/ado-install-azure-pipelines.png similarity index 100% rename from images/ado-install-azure-pipelines.png rename to documentation/deployguides/images/ado-install-azure-pipelines.png diff --git a/documentation/deployguides/images/ado-open-pipelinesSecurity.png b/documentation/deployguides/images/ado-open-pipelinesSecurity.png new file mode 100644 index 0000000..e28eeb5 Binary files /dev/null and b/documentation/deployguides/images/ado-open-pipelinesSecurity.png differ diff --git a/documentation/deployguides/images/ado-open-projectSettings.png b/documentation/deployguides/images/ado-open-projectSettings.png new file mode 100644 index 0000000..c6c4bcb Binary files /dev/null and b/documentation/deployguides/images/ado-open-projectSettings.png differ diff --git a/images/ado-org-access.png b/documentation/deployguides/images/ado-org-access.png similarity index 100% rename from images/ado-org-access.png rename to documentation/deployguides/images/ado-org-access.png diff --git a/documentation/deployguides/images/ado-parameters-sparepipeline.png b/documentation/deployguides/images/ado-parameters-sparepipeline.png new file mode 100644 index 0000000..1e4793d Binary files /dev/null and b/documentation/deployguides/images/ado-parameters-sparepipeline.png differ diff --git a/documentation/deployguides/images/ado-permissions-repo.png b/documentation/deployguides/images/ado-permissions-repo.png new file mode 100644 index 0000000..a47c887 Binary files /dev/null and b/documentation/deployguides/images/ado-permissions-repo.png differ diff --git a/documentation/deployguides/images/ado-pipeline-permissions.png b/documentation/deployguides/images/ado-pipeline-permissions.png new file mode 100644 index 0000000..ee12017 Binary files /dev/null and b/documentation/deployguides/images/ado-pipeline-permissions.png differ diff --git a/documentation/deployguides/images/ado-pipeline-permissionsPermit.png b/documentation/deployguides/images/ado-pipeline-permissionsPermit.png new file mode 100644 index 0000000..9b27d89 Binary files /dev/null and b/documentation/deployguides/images/ado-pipeline-permissionsPermit.png differ diff --git a/documentation/deployguides/images/ado-pipeline-resourcesRepoADO.png b/documentation/deployguides/images/ado-pipeline-resourcesRepoADO.png new file mode 100644 index 0000000..578af73 Binary files /dev/null and b/documentation/deployguides/images/ado-pipeline-resourcesRepoADO.png differ diff --git a/documentation/deployguides/images/ado-pipeline-resourcesRepoGH.png b/documentation/deployguides/images/ado-pipeline-resourcesRepoGH.png new file mode 100644 index 0000000..179e569 Binary files /dev/null and b/documentation/deployguides/images/ado-pipeline-resourcesRepoGH.png differ diff --git a/documentation/deployguides/images/ado-pipeline-sparsecheckout.png b/documentation/deployguides/images/ado-pipeline-sparsecheckout.png new file mode 100644 index 0000000..e0e47fe Binary files /dev/null and b/documentation/deployguides/images/ado-pipeline-sparsecheckout.png differ diff --git a/images/ado-project-settings.png b/documentation/deployguides/images/ado-project-settings.png similarity index 100% rename from images/ado-project-settings.png rename to documentation/deployguides/images/ado-project-settings.png diff --git a/documentation/deployguides/images/ado-run-sparepipeline.png b/documentation/deployguides/images/ado-run-sparepipeline.png new file mode 100644 index 0000000..1e96af4 Binary files /dev/null and b/documentation/deployguides/images/ado-run-sparepipeline.png differ diff --git a/documentation/deployguides/images/ado-save-sparepipeline.png b/documentation/deployguides/images/ado-save-sparepipeline.png new file mode 100644 index 0000000..b178511 Binary files /dev/null and b/documentation/deployguides/images/ado-save-sparepipeline.png differ diff --git a/images/ado-select-pipeline-yaml-file.png b/documentation/deployguides/images/ado-select-pipeline-yaml-file.png similarity index 100% rename from images/ado-select-pipeline-yaml-file.png rename to documentation/deployguides/images/ado-select-pipeline-yaml-file.png diff --git a/images/ado-select-repository.png b/documentation/deployguides/images/ado-select-repository.png similarity index 100% rename from images/ado-select-repository.png rename to documentation/deployguides/images/ado-select-repository.png diff --git a/images/ado-service-principal-manual.png b/documentation/deployguides/images/ado-service-principal-manual.png similarity index 100% rename from images/ado-service-principal-manual.png rename to documentation/deployguides/images/ado-service-principal-manual.png diff --git a/images/ado-trainingpipeline.png b/documentation/deployguides/images/ado-trainingpipeline.png similarity index 100% rename from images/ado-trainingpipeline.png rename to documentation/deployguides/images/ado-trainingpipeline.png diff --git a/documentation/deployguides/images/ado-view-allPipelines.png b/documentation/deployguides/images/ado-view-allPipelines.png new file mode 100644 index 0000000..9fbc5a9 Binary files /dev/null and b/documentation/deployguides/images/ado-view-allPipelines.png differ diff --git a/documentation/deployguides/images/ado-view-repoSparseCheckout.png b/documentation/deployguides/images/ado-view-repoSparseCheckout.png new file mode 100644 index 0000000..a44ac4f Binary files /dev/null and b/documentation/deployguides/images/ado-view-repoSparseCheckout.png differ diff --git a/images/ado-wheresyourcode.png b/documentation/deployguides/images/ado-wheresyourcode.png similarity index 100% rename from images/ado-wheresyourcode.png rename to documentation/deployguides/images/ado-wheresyourcode.png diff --git a/documentation/deployguides/images/aml-taxi-bep.png b/documentation/deployguides/images/aml-taxi-bep.png new file mode 100644 index 0000000..c36fdfa Binary files /dev/null and b/documentation/deployguides/images/aml-taxi-bep.png differ diff --git a/documentation/deployguides/images/aml-taxi-oep.png b/documentation/deployguides/images/aml-taxi-oep.png new file mode 100644 index 0000000..b6e281f Binary files /dev/null and b/documentation/deployguides/images/aml-taxi-oep.png differ diff --git a/images/batchendpointpipeline.png b/documentation/deployguides/images/batchendpointpipeline.png similarity index 100% rename from images/batchendpointpipeline.png rename to documentation/deployguides/images/batchendpointpipeline.png diff --git a/images/expandedElement.png b/documentation/deployguides/images/expandedElement.png similarity index 100% rename from images/expandedElement.png rename to documentation/deployguides/images/expandedElement.png diff --git a/documentation/deployguides/images/gh-actions.png b/documentation/deployguides/images/gh-actions.png new file mode 100644 index 0000000..b65c82f Binary files /dev/null and b/documentation/deployguides/images/gh-actions.png differ diff --git a/documentation/deployguides/images/gh-batch-endpoint.png b/documentation/deployguides/images/gh-batch-endpoint.png new file mode 100644 index 0000000..34fb248 Binary files /dev/null and b/documentation/deployguides/images/gh-batch-endpoint.png differ diff --git a/documentation/deployguides/images/gh-create-empty-mlops-sparse.png b/documentation/deployguides/images/gh-create-empty-mlops-sparse.png new file mode 100644 index 0000000..957bfc8 Binary files /dev/null and b/documentation/deployguides/images/gh-create-empty-mlops-sparse.png differ diff --git a/images/gh-createnewrepo.png b/documentation/deployguides/images/gh-createnewrepo.png similarity index 100% rename from images/gh-createnewrepo.png rename to documentation/deployguides/images/gh-createnewrepo.png diff --git a/documentation/deployguides/images/gh-deploy-infra.png b/documentation/deployguides/images/gh-deploy-infra.png new file mode 100644 index 0000000..0749a19 Binary files /dev/null and b/documentation/deployguides/images/gh-deploy-infra.png differ diff --git a/documentation/deployguides/images/gh-fork.png b/documentation/deployguides/images/gh-fork.png new file mode 100644 index 0000000..471ccbb Binary files /dev/null and b/documentation/deployguides/images/gh-fork.png differ diff --git a/documentation/deployguides/images/gh-generate.png b/documentation/deployguides/images/gh-generate.png new file mode 100644 index 0000000..f508d90 Binary files /dev/null and b/documentation/deployguides/images/gh-generate.png differ diff --git a/documentation/deployguides/images/gh-infra-pipeline.png b/documentation/deployguides/images/gh-infra-pipeline.png new file mode 100644 index 0000000..8697c1a Binary files /dev/null and b/documentation/deployguides/images/gh-infra-pipeline.png differ diff --git a/documentation/deployguides/images/gh-online-endpoint.png b/documentation/deployguides/images/gh-online-endpoint.png new file mode 100644 index 0000000..2383cf7 Binary files /dev/null and b/documentation/deployguides/images/gh-online-endpoint.png differ diff --git a/documentation/deployguides/images/gh-secrets.png b/documentation/deployguides/images/gh-secrets.png new file mode 100644 index 0000000..64010ab Binary files /dev/null and b/documentation/deployguides/images/gh-secrets.png differ diff --git a/documentation/deployguides/images/gh-settings.png b/documentation/deployguides/images/gh-settings.png new file mode 100644 index 0000000..379492a Binary files /dev/null and b/documentation/deployguides/images/gh-settings.png differ diff --git a/documentation/deployguides/images/gh-setup1.png b/documentation/deployguides/images/gh-setup1.png new file mode 100644 index 0000000..ebd0bfa Binary files /dev/null and b/documentation/deployguides/images/gh-setup1.png differ diff --git a/documentation/deployguides/images/gh-setup2.png b/documentation/deployguides/images/gh-setup2.png new file mode 100644 index 0000000..562aefc Binary files /dev/null and b/documentation/deployguides/images/gh-setup2.png differ diff --git a/documentation/deployguides/images/gh-setup3.png b/documentation/deployguides/images/gh-setup3.png new file mode 100644 index 0000000..0e59f13 Binary files /dev/null and b/documentation/deployguides/images/gh-setup3.png differ diff --git a/documentation/deployguides/images/gh-training-pipeline.png b/documentation/deployguides/images/gh-training-pipeline.png new file mode 100644 index 0000000..fd1cab7 Binary files /dev/null and b/documentation/deployguides/images/gh-training-pipeline.png differ diff --git a/documentation/deployguides/images/gh-training-step.png b/documentation/deployguides/images/gh-training-step.png new file mode 100644 index 0000000..55de149 Binary files /dev/null and b/documentation/deployguides/images/gh-training-step.png differ diff --git a/images/gh-usethistemplate.png b/documentation/deployguides/images/gh-usethistemplate.png similarity index 100% rename from images/gh-usethistemplate.png rename to documentation/deployguides/images/gh-usethistemplate.png diff --git a/documentation/deployguides/images/gh-workflows.png b/documentation/deployguides/images/gh-workflows.png new file mode 100644 index 0000000..1db966a Binary files /dev/null and b/documentation/deployguides/images/gh-workflows.png differ diff --git a/images/iacpipelineresult.png b/documentation/deployguides/images/iacpipelineresult.png similarity index 100% rename from images/iacpipelineresult.png rename to documentation/deployguides/images/iacpipelineresult.png diff --git a/images/onlineEndpoint.png b/documentation/deployguides/images/onlineEndpoint.png similarity index 100% rename from images/onlineEndpoint.png rename to documentation/deployguides/images/onlineEndpoint.png diff --git a/images/onlineendpointpipieline.png b/documentation/deployguides/images/onlineendpointpipieline.png similarity index 100% rename from images/onlineendpointpipieline.png rename to documentation/deployguides/images/onlineendpointpipieline.png diff --git a/documentation/structure/README.md b/documentation/structure/README.md new file mode 100644 index 0000000..fbde54b --- /dev/null +++ b/documentation/structure/README.md @@ -0,0 +1,59 @@ +# Solution Accelerator Structure and Implementation + +The solution accelerator is not a product but rather an adaptable framework for bootstrapping end-to-end machine learning projects based on defined patterns using the tools your organization uses. + +The MLOps pattern the solution accelerator deploys is broadly organized into two loops, an inner loop and an outer loop: +* **Inner loop**: Data Scientists iterate over data wrangling, model development, and experimentation. +* **Outer loop**: Infrastructure and ML Engineers implement CI/CD patterns to orchestrate the model through testing, staging, production, and monitoring. + +> Note that this solution accelerator focuses on implementing end-to-end MLOps from model development through deployment. Beyond light data wrangling and feature engineering that may occur within the inner loop, the accelerator does not address DataOps and larger scale data engineering. + +## Repositories + +The solution accelerator itself is comprised of three code repositories with templates that allow you to bootstrap a new machine learning project based on your choices of infrastructure management, mlops orchestration, and ML project use case: +1. [Azure/mlops-v2](https://github.com/Azure/mlops-v2): This repository, the deployment starting point and "project factory" for repeatable MLOps projects. This repository is cloned to allow for a local copy of documentation and customization of the project deployment script. + +2. [Azure/mlops-template](https://github.com/Azure/mlops-templates): defines templates for mlops pipelines and actions such as training, model registration, deployment, etc. using either the CLI or SDK. This repository is forked into your organization to provide mlops pipelines that may be modified and reused across multiple projects or kept in sync with the parent repository as updates are made to accomodate new function in Azure Machine Learning. + +3. [Azure/mlops-project-template](https://github.com/Azure/mlops-project-template): defines templates for deploying infastructure based on bicep or terraform as well as project spaces appropriate to each project type ([classical-ml](https://github.com/Azure/mlops-project-template/tree/main/classical), [computer vision](https://github.com/Azure/mlops-project-template/tree/main/cv), [natural language processing](https://github.com/Azure/mlops-project-template/tree/main/nlp)). A copy of this repository is generated from a template of base infrastructure deployment patterns which can be modified for suit the requirements of your organization. + +A diagram of the repositories and their relationships is below: + +![](media/repository_arch.png) + +## Defining a New ML Project + +A new MLOps project is bootstrapped by configuring and running the [sparse_checkout.sh](/sparse_checkout.sh) script in the main repository. + +Configuration options at the top of the sparse_checkout.sh script are: + +```bash +infrastructure_version=terraform #options: terraform / bicep +project_type=classical #options: classical / cv / nlp +mlops_version=aml-cli-v2 #options: aml-cli-v2 / python-sdk-v1 / python-sdk-v2 / rai-aml-cli-v2 +orchestration=azure-devops #options: github-actions / azure-devops +git_folder_location='' #replace with the local root folder location where you want to create the project folder +project_name=test-project #replace with your project name +github_org_name=orgname #replace with your github org name +project_template_github_url=https://github.com/azure/mlops-project-template #replace with the url for the project template for your organization created in step 2.2, or leave for demo purposes +``` + +**infrastructure_version** - choose the tooling you want to use, terraform or bicep, to manage infrastructure deployment + +**project_type** - choose the project type from classical ml on tabular data, computer vision, or natural language processing. A workspace with typical inner loop steps and mlops pipelines appropriate to these use cases will be created for you. + +**mlops_version** - choose the implementation approach, CLI or SDK, for interacting with the workspace and defining the mlops pipelines depending on needs or migrating legacy AzureML code. + +**orchestration** - choose the mlops orchestration method, either Azure DevOps or GitHub Actions. + +**git_folder_location** - this is a local directory into which your project will be created then pushed to your project repository. + +**project_name** - Project name. + +**github_org_name** - your github organization that will host the project source. + +**project_template_github_url** - URL of the mlops_project_template to build your projects from. You can leave this as https://github.com/azure/mlops-project-template to use the base templates or point to a fork of this repository then modify/define your own templates for your organization. + +## Creating the new project through sparse checkout + +Once the [sparse_checkout.sh](/sparse_checkout.sh) script is configured, running it will perform a git sparse checkout from the template repositories, checking out only the code relevant to your selections in the script. This code is placed into your git_folder_location then the new customized project is pushed into your source code repository. \ No newline at end of file diff --git a/documentation/structure/media/repository_arch.png b/documentation/structure/media/repository_arch.png new file mode 100644 index 0000000..2bbdd2a Binary files /dev/null and b/documentation/structure/media/repository_arch.png differ diff --git a/documentation/structure/media/repository_arch.png:Zone.Identifier b/documentation/structure/media/repository_arch.png:Zone.Identifier new file mode 100644 index 0000000..4cf1b71 --- /dev/null +++ b/documentation/structure/media/repository_arch.png:Zone.Identifier @@ -0,0 +1,3 @@ +[ZoneTransfer] +ZoneId=3 +HostUrl=https://microsoft-my.sharepoint.com/ diff --git a/images/gh-create-empty-mlops-sparse.png b/images/gh-create-empty-mlops-sparse.png deleted file mode 100644 index 20d8990..0000000 Binary files a/images/gh-create-empty-mlops-sparse.png and /dev/null differ diff --git a/images/gh-fork.png b/images/gh-fork.png deleted file mode 100644 index 64f7483..0000000 Binary files a/images/gh-fork.png and /dev/null differ diff --git a/images/gh-generate.png b/images/gh-generate.png deleted file mode 100644 index 8eee54a..0000000 Binary files a/images/gh-generate.png and /dev/null differ diff --git a/documentation/repositoryfiles/mlopsheader.jpg b/media/mlopsheader.jpg similarity index 100% rename from documentation/repositoryfiles/mlopsheader.jpg rename to media/mlopsheader.jpg diff --git a/sparse_checkout.sh b/sparse_checkout.sh index 285098b..7872ec0 100644 --- a/sparse_checkout.sh +++ b/sparse_checkout.sh @@ -1,11 +1,11 @@ infrastructure_version=terraform #options: terraform / bicep -project_type=classical #options: classical / cv -mlops_version=aml-cli-v2 #options: python-sdk / aml-cli-v2 +project_type=classical #options: classical / cv / nlp +mlops_version=aml-cli-v2 #options: aml-cli-v2 / python-sdk-v1 / python-sdk-v2 / rai-aml-cli-v2 +orchestration=azure-devops #options: github-actions / azure-devops git_folder_location='' #replace with the local root folder location where you want to create the project folder project_name=Mlops-Test #replace with your project name github_org_name=orgname #replace with your github org name project_template_github_url=https://github.com/azure/mlops-project-template #replace with the url for the project template for your organization created in step 2.2, or leave for demo purposes -orchestration=azure-devops #options: github-actions / azure-devops cd $git_folder_location @@ -27,13 +27,17 @@ mv $project_type/$mlops_version/data-science data-science mv $project_type/$mlops_version/mlops mlops mv $project_type/$mlops_version/data data -if [[ "$mlops_version" == "python-sdk" ]] +if [[ "$mlops_version" == "python-sdk-v1" ]] then - echo "python-sdk" + echo "mlops_version=python-sdk-v1" mv $project_type/$mlops_version/config-aml.yml config-aml.yml fi rm -rf $project_type +mv infrastructure/$infrastructure_version $infrastructure_version +rm -rf infrastructure +mv $infrastructure_version infrastructure + if [[ "$orchestration" == "github-actions" ]] then echo "github-actions" @@ -41,21 +45,24 @@ then mkdir -p .github/workflows/ mv mlops/github-actions/* .github/workflows/ rm -rf mlops/github-actions + mv infrastructure/github-actions/* .github/workflows/ + rm -rf infrastructure/devops-pipelines + rm -rf infrastructure/github-actions fi if [[ "$orchestration" == "azure-devops" ]] then echo "azure-devops" rm -rf mlops/github-actions + rm -rf infrastructure/github-actions fi -mv infrastructure/$infrastructure_version $infrastructure_version -rm -rf infrastructure -mv $infrastructure_version infrastructure - # Upload to custom repo in Github rm -rf .git git init -b main + +gh repo create $project_name --private + git remote add origin git@github.com:$github_org_name/$project_name.git git add . && git commit -m 'initial commit' git push --set-upstream origin main