Merging the complete build with online and batch end points. (#15)

* Removed dummy files and added actual files for training pipeline.
* Organizing artifactstore
* Set up CI with Azure Pipelines
* Updated the service connection name for the template to run.
* Update deploy-model-training-pipeline-v2.yml for Azure Pipelines


Co-authored-by: cindyweng <weng.cindy@gmail.com>
Co-authored-by: Cindy Weng <8880364+cindyweng@users.noreply.github.com>
Co-authored-by: murggu <amurguzur@gmail.com>
Co-authored-by: Maggie Mhanna <maggiemhanna@gmail.com>
Co-authored-by: Christoph Muller-Reyes <chrey@microsoft.com>
Co-authored-by: chrey-gh <58181624+chrey-gh@users.noreply.github.com>
This commit is contained in:
Setu Chokshi 2022-04-13 18:56:04 +05:30 коммит произвёл GitHub
Родитель 9ad529df73
Коммит ab1fe15054
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
87 изменённых файлов: 15377 добавлений и 75 удалений

8
.gitignore поставляемый
Просмотреть файл

@ -130,3 +130,11 @@ dmypy.json
# Pyre type checker
.pyre/
# Terraform
.terraform.lock.hcl
terraform.tfstate
terraform.tfstate.backup
.terraform.tfstate.lock.info
.terraform
terraform.tfvars

Просмотреть файл

@ -1,9 +0,0 @@
# Microsoft Open Source Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
Resources:
- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns

40
QUICKSTART.md Normal file
Просмотреть файл

@ -0,0 +1,40 @@
# Quickstart
## Setting Variables
---
For a quickstart, the only variables needed to be set are in 'config-infra-dev.yml':
* If your location (Azure Regtion) is different from 'northeurope' then you'll have to adjust it to the desired one p.ex. 'westus' like here: 'location: westus'
* the function of 'namespace' is to make all your artifacts, that you're going to deploy, unique. Since there's going to be a Storage Account deployed, it has to adhere to the naming limitations of these (3-24 characters, all lowercase letters or numbers)
* as of (20220405) the 'ado_service_connection_rg' needs to have contributor permission subscription wide, since there's two resource groups being created: one for the Terraform state, and the second, which contains the artifacts for the Machine Learning Workspace (Storage Account, Key Vault, Application Insights, Container Registry). You then have to create a service connection in your ADO project, which has the same name ' or adjust it here accordingly.
## Deploying Infrastructure via ADO (Azure DevOps)
---
To daploy the infrastructure in ADO (Azure DevOps), you will have to have an organization and a project, with the mentioned service connection configured.
Then under pipelines you'll create a new pipeline and choose 'infrastructure\terraform\pipelines\tf-ado-deploy-infra.yml' as the source.
You can then run the pipeline, which should create the following artifacts:
* Resource Group for Terraform State including Storage Account
* Resource Group for your Workspace including Storage Account, Container Registry, Application Insights, Keyvault and the Azure Machine Learning Workspace itself.
> If you didn't change the variable 'enable_aml_computecluster' from 'true' to 'false' a compute cluster is created as defined in 'infrastructure\terraform\modules\aml-workspace\main.tf'
As of now (20220410) the Terraform infrastructure pipeline will create a new pair of Terraform state Resource Group and Machine Learning workspace Resource Group every time it runs, with a slightly different name (number 10x).
The successfully run pipeline should look like this:
![IaC image](./images/iacpipelineresult.png)
<p>
</p>
## Deploying Training Pipeline via ADO (Azure DevOps)
---

Просмотреть файл

@ -1,41 +0,0 @@
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.5 BLOCK -->
## Security
Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below.
## Reporting Security Issues
**Please do not report security vulnerabilities through public GitHub issues.**
Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
* Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
* Full paths of source file(s) related to the manifestation of the issue
* The location of the affected source code (tag/branch/commit or direct URL)
* Any special configuration required to reproduce the issue
* Step-by-step instructions to reproduce the issue
* Proof-of-concept or exploit code (if possible)
* Impact of the issue, including how an attacker might exploit the issue
This information will help us triage your report more quickly.
If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
## Preferred Languages
We prefer all communications to be in English.
## Policy
Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
<!-- END MICROSOFT SECURITY.MD BLOCK -->

Просмотреть файл

@ -1,25 +0,0 @@
# TODO: The maintainer of this repo has not yet edited this file
**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
- **No CSS support:** Fill out this template with information about how to file issues and get help.
- **Yes CSS support:** Fill out an intake form at [aka.ms/spot](https://aka.ms/spot). CSS will work with/help you to determine next steps. More details also available at [aka.ms/onboardsupport](https://aka.ms/onboardsupport).
- **Not sure?** Fill out a SPOT intake as though the answer were "Yes". CSS will help you decide.
*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
# Support
## How to file issues and get help
This project uses GitHub Issues to track bugs and feature requests. Please search the existing
issues before filing new issues to avoid duplicates. For new issues, file your bug or
feature request as a new Issue.
For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
## Microsoft Support Policy
Support for this **PROJECT or PRODUCT** is limited to the resources listed above.

75
config-aml.yml Normal file
Просмотреть файл

@ -0,0 +1,75 @@
variables:
ap_vm_image: ubuntu-20.04
# Training pipeline settings
# Training dataset settings
training_dataset_name: uci-credit
training_dataset_description: uci_credit
training_dataset_local_path: data/training/
training_dataset_path_on_datastore: data/training/
training_dataset_type: local
training_dataset_storage_url: 'https://azureaidemostorage.blob.core.windows.net/data/'
# Training AzureML Environment name
training_env_name: credit-training
# Training AzureML Environment conda yaml
training_env_conda_yaml: mlops/environments/train.yml
# Name for the training pipeline
training_pipeline_name: credit-training
# Compute target for pipeline
training_target: cpu-cluster
training_target_sku: STANDARD_D2_V2
training_target_min_nodes: 0
training_target_max_nodes: 4
# Training arguments specification; use azureml:dataset_name:version to reference an AML Dataset for --data_path
training_arguments: --data_path azureml:uci-credit:1
# Name under which the model will be registered
model_name: credit-ci
# Batch pipeline settings
# Batch scoring dataset settings
scoring_dataset_name: credit-batch-input
scoring_dataset_description: credit-batch-input
scoring_dataset_local_path: data/scoring/
scoring_dataset_path_on_datastore: data/scoring/
scoring_dataset_type: local
scoring_dataset_storage_url: 'https://azureaidemostorage.blob.core.windows.net/data/'
# Batch AzureML Environment name
batch_env_name: credit-batch
# Batch AzureML Environment conda yaml
batch_env_conda_yaml: mlops/environments/batch.yml
# Name for the batch scoring pipeline
batch_pipeline_name: credit-batch-scoring
# Compute target for pipeline
batch_target: cpu-cluster
#not needed because batch uses the same target as training
# batch_target_sku: STANDARD_D2_V2
# batch_target_min_nodes: 0
# batch_target_max_nodes: 4
# Input batch dataset
batch_input_dataset_name: credit-batch-input
# Output dataset with results
batch_output_dataset_name: credit-batch-output
batch_output_path_on_datastore: credit-batch-scoring-results/{run-id}
batch_output_filename: results.csv
# Parallelization settings
batch_mini_batch_size: 8
batch_error_threshold: 1
batch_process_count_per_node: 1
batch_node_count: 1

36
config-infra-dev.yml Normal file
Просмотреть файл

@ -0,0 +1,36 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
variables:
# Global
namespace: mlopstab
postfix: 441
location: northeurope
environment: dev
enable_aml_computecluster: true
# Azure DevOps
ado_service_connection_rg: Azure-ARM-Dev #-murggu
ado_service_connection_aml_ws: Azure-ARM-Dev
# Github
gh_service_endpoint: mlops-v2-tabular #this isn't allowed to be a variable in the devops yaml, so needs to be hardcoded in devops pipelines
gh_org_name:
gh_org_url:
# IaC
resource_group: azureml-examples-rg #rg-$(namespace)-$(postfix)
aml_workspace: main #mlw-$(namespace)-$(postfix)
application_insights: mlw-$(namespace)-$(postfix)
key_vault: kv-$(namespace)-$(postfix)
container_registry: cr$(namespace)$(postfix)
storage_account: st$(namespace)$(postfix)
# Terraform
terraform_version: 0.14.7
terraform_workingdir: infrastructure/terraform
terraform_st_resource_group: rg-$(namespace)-$(postfix)-tf-state
terraform_st_storage_account: st$(namespace)$(postfix)tfstate
terraform_st_container_name: default
terraform_st_key: mlops-tab

20
config-infra-prod.yml Normal file
Просмотреть файл

@ -0,0 +1,20 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# Definition of production infra-related environment variables
variables:
# Prod Environment
environment: prod
resource_group: azureml-examples-rg #rg-mlops-template-prod-001
location: westeurope
namespace: mlopsprodtmpl
aml_workspace: main #aml$(namespace)
storage_account: sa$(namespace)
key_vault: kv$(namespace)
application_insights: ai$(namespace)
container_registry: cr$(namespace)
service_connection_rg: conn-mlops-sub-infra
service_connection_aml_ws: conn-mlops-aml-bsc-prod-tmpl
gh_service_connection: mlops-v2-tabular

Просмотреть файл

@ -0,0 +1,193 @@
# NYC Taxi Data Regression
### This is an end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs. You can learn more about creating reusable components for your pipeline [here](https://github.com/Azure/azureml_run_specification/blob/master/specs/pipeline-component.md).
* Merge Taxi Data
* This component takes multiple taxi datasets (yellow and green) and merges/filters the data.
* Input: Local data under samples/nyc_taxi_data_regression/data (multiple .csv files)
* Output: Single filtered dataset (.csv)
* Taxi Feature Engineering
* This component creates features out of the taxi data to be used in training.
* Input: Filtered dataset from previous step (.csv)
* Output: Dataset with 20+ features (.csv)
* Train Linear Regression Model
* This component splits the dataset into train/test sets and trains an sklearn Linear Regressor with the training set.
* Input: Data with feature set
* Output: Trained model (pickle format) and data subset for test (.csv)
* Predict Taxi Fares
* This component uses the trained model to predict taxi fares on the test set.
* Input: Linear regression model and test data from previous step
* Output: Test data with predictions added as a column (.csv)
* Score Model
* This component scores the model based on how accurate the predictions are in the test set.
* Input: Test data with predictions and model
* Output: Report with model coefficients and evaluation scores (.txt)
#### 1. Make sure you are in the `nyc_taxi_data_regression` directory for this sample.
#### 2. Submit the Pipeline Job.
Make sure the compute cluster used in job.yml is the one that is actually available in your workspace.
Submit the Pipeline Job
```
az ml job create --file pipeline.yml
```
Once you submit the job, you will find the URL to the Studio UI view the job graph and logs in the `Studio.endpoints` -> `services` section of the output.
Sample output
```
(cliv2-dev) PS D:\azureml-examples-lochen\cli\jobs\pipelines-with-components\nyc_taxi_data_regression> az ml job create -f pipeline.yml
Command group 'ml job' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Asset labels are still in preview and may resolve to an incorrect asset version.
{
"creation_context": {
"created_at": "2022-03-15T11:25:38.323397+00:00",
"created_by": "Long Chen",
"created_by_type": "User"
},
"experiment_name": "nyc_taxi_data_regression",
"id": "azureml:/subscriptions/ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9/resourceGroups/pipeline-pm/providers/Microsoft.MachineLearningServices/workspaces/pm-dev/jobs/6cef8ff4-2bd3-4101-adf2-11e0b62e6f6d",
"inputs": {
"pipeline_job_input": {
"mode": "ro_mount",
"path": "azureml:azureml://datastores/workspaceblobstore/paths/LocalUpload/aa784b6f4b0d0d3090bcd00415290f39/data",
"type": "uri_folder"
}
},
"jobs": {
"predict-job": {
"$schema": "{}",
"command": "",
"component": "azureml:49fa5eab-ad35-e3eb-27bc-5568fd2dcd74:1",
"environment_variables": {},
"inputs": {
"model_input": "${{parent.jobs.train-job.outputs.model_output}}",
"test_data": "${{parent.jobs.train-job.outputs.test_data}}"
},
"outputs": {
"predictions": "${{parent.outputs.pipeline_job_predictions}}"
},
"type": "command"
},
"prep-job": {
"$schema": "{}",
"command": "",
"component": "azureml:526bfb0e-aba5-36f3-ab06-2b4df9ec1554:1",
"environment_variables": {},
"inputs": {
"raw_data": "${{parent.inputs.pipeline_job_input}}"
},
"outputs": {
"prep_data": "${{parent.outputs.pipeline_job_prepped_data}}"
},
"type": "command"
},
"score-job": {
"$schema": "{}",
"command": "",
"component": "azureml:f0ae472c-7639-1b4a-47ff-3155384584cf:1",
"environment_variables": {},
"inputs": {
"model": "${{parent.jobs.train-job.outputs.model_output}}",
"predictions": "${{parent.jobs.predict-job.outputs.predictions}}"
},
"outputs": {
"score_report": "${{parent.outputs.pipeline_job_score_report}}"
},
"type": "command"
},
"train-job": {
"$schema": "{}",
"command": "",
"component": "azureml:df45efbf-8373-82fd-7d5e-56fa3cd31c05:1",
"environment_variables": {},
"inputs": {
"training_data": "${{parent.jobs.transform-job.outputs.transformed_data}}"
},
"outputs": {
"model_output": "${{parent.outputs.pipeline_job_trained_model}}",
"test_data": "${{parent.outputs.pipeline_job_test_data}}"
},
"type": "command"
},
"transform-job": {
"$schema": "{}",
"command": "",
"component": "azureml:107ae7d3-7813-1399-34b1-17335735496c:1",
"environment_variables": {},
"inputs": {
"clean_data": "${{parent.jobs.prep-job.outputs.prep_data}}"
},
"outputs": {
"transformed_data": "${{parent.outputs.pipeline_job_transformed_data}}"
},
"type": "command"
}
},
"name": "6cef8ff4-2bd3-4101-adf2-11e0b62e6f6d",
"outputs": {
"pipeline_job_predictions": {
"mode": "upload",
"type": "uri_folder"
},
"pipeline_job_prepped_data": {
"mode": "upload",
"type": "uri_folder"
},
"pipeline_job_score_report": {
"mode": "upload",
"type": "uri_folder"
},
"pipeline_job_test_data": {
"mode": "upload",
"type": "uri_folder"
},
"pipeline_job_trained_model": {
"mode": "upload",
"type": "uri_folder"
},
"pipeline_job_transformed_data": {
"mode": "upload",
"type": "uri_folder"
}
},
"properties": {
"azureml.continue_on_step_failure": "False",
"azureml.git.dirty": "True",
"azureml.parameters": "{}",
"azureml.pipelineComponent": "pipelinerun",
"azureml.runsource": "azureml.PipelineRun",
"mlflow.source.git.branch": "march-cli-preview",
"mlflow.source.git.commit": "8e28ab743fd680a95d71a50e456c68757669ccc7",
"mlflow.source.git.repoURL": "https://github.com/Azure/azureml-examples.git",
"runSource": "MFE",
"runType": "HTTP"
},
"resourceGroup": "pipeline-pm",
"services": {
"Studio": {
"endpoint": "https://ml.azure.com/runs/6cef8ff4-2bd3-4101-adf2-11e0b62e6f6d?wsid=/subscriptions/ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9/resourcegroups/pipeline-pm/workspaces/pm-dev&tid=72f988bf-86f1-41af-91ab-2d7cd011db47",
"job_service_type": "Studio"
},
"Tracking": {
"endpoint": "azureml://eastus.api.azureml.ms/mlflow/v1.0/subscriptions/ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9/resourceGroups/pipeline-pm/providers/Microsoft.MachineLearningServices/workspaces/pm-dev?",
"job_service_type": "Tracking"
}
},
"settings": {
"continue_on_step_failure": false,
"default_compute": "cpu-cluster",
"default_datastore": "workspaceblobstore"
},
"status": "Preparing",
"tags": {
"azureml.Designer": "true"
},
"type": "pipeline"
}
```

Просмотреть файл

@ -0,0 +1,23 @@
artifact_path: model
flavors:
python_function:
env: conda.yml
loader_module: mlflow.sklearn
model_path: model.pkl
python_version: 3.7.7
sklearn:
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 0.22.2.post1
run_id: e07f67c7-f37b-4534-8ac9-9984deb45e2f
saved_input_example_info:
artifact_path: input_example.json
pandas_orient: split
type: dataframe
signature:
inputs: '[{"name": "fareAmount", "type": "float"}, {"name": "paymentType", "type":
"integer"}, {"name": "passengerCount", "type": "integer"}, {"name": "tripDistance",
"type": "float"}, {"name": "tripTimeSecs", "type": "integer"}, {"name": "pickupTimeBin",
"type": "string"}]'
outputs: '[{"type": "integer"}]'
utc_time_created: '2020-11-05 03:39:25.470901'

Просмотреть файл

@ -0,0 +1,11 @@
channels:
- defaults
- conda-forge
dependencies:
- python=3.7.7
- scikit-learn=0.22.2.post1
- pip
- pip:
- mlflow
- cloudpickle==1.6.0
name: mlflow-env

Двоичный файл не отображается.

Просмотреть файл

@ -0,0 +1,6 @@
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
name: mlflowdp
endpoint_name: mybatchedp
model:
path: ./autolog_nyc_taxi
compute: azureml:batch-cluster

Просмотреть файл

@ -0,0 +1,13 @@
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: my-endpoint
model:
path: model/sklearn_regression_model.pkl
code_configuration:
code: src/
scoring_script: score.py
environment:
conda_file: environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_F2s_v2
instance_count: 1

Просмотреть файл

@ -0,0 +1,13 @@
name: model-env
channels:
- conda-forge
dependencies:
- python=3.7
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pip:
- azureml-defaults==1.38.0
- inference-schema[numpy-support]==1.3.0
- joblib==1.0.1

Двоичный файл не отображается.

Просмотреть файл

@ -0,0 +1,4 @@
{"data": [
[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]
]}

Просмотреть файл

@ -0,0 +1,35 @@
import os
import logging
import json
import numpy
import joblib
def init():
"""
This function is called when the container is initialized/started, typically after create/update of the deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
model_path = os.path.join(
os.getenv("AZUREML_MODEL_DIR"), "sklearn_regression_model.pkl"
)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Init complete")
def run(raw_data):
"""
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
In the example we extract the data from the json input and call the scikit-learn model's predict()
method and return the result back
"""
logging.info("Request received")
data = json.loads(raw_data)["data"]
data = numpy.array(data)
result = model.predict(data)
logging.info("Request processed")
return result.tolist()

Просмотреть файл

@ -0,0 +1,13 @@
name: model-env
channels:
- conda-forge
dependencies:
- python=3.7
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pip:
- azureml-defaults==1.38.0
- inference-schema[numpy-support]==1.3.0
- joblib==1.0.1

Просмотреть файл

@ -0,0 +1,13 @@
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: green
endpoint_name: my-endpoint
model:
path: model/sklearn_regression_model.pkl
code_configuration:
code: src/
scoring_script: score.py
environment:
conda_file: environment/conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
instance_type: Standard_F2s_v2
instance_count: 1

Двоичный файл не отображается.

Просмотреть файл

@ -0,0 +1,4 @@
{"data": [
[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]
]}

Просмотреть файл

@ -0,0 +1,36 @@
import os
import logging
import json
import numpy
import joblib
def init():
"""
This function is called when the container is initialized/started, typically after create/update of the deployment.
You can write the logic here to perform init operations like caching the model in memory
"""
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
model_path = os.path.join(
os.getenv("AZUREML_MODEL_DIR"), "sklearn_regression_model.pkl"
)
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
logging.info("Init complete")
def run(raw_data):
"""
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
In the example we extract the data from the json input and call the scikit-learn model's predict()
method and return the result back
"""
logging.info("Request received")
data = json.loads(raw_data)["data"]
data = numpy.array(data)
result = model.predict(data)
logging.info("Request processed")
return result.tolist()

Просмотреть файл

@ -0,0 +1,32 @@
# <component>
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: evaluate_model
version: 1
display_name: evaluate-model
type: command
inputs:
model_name:
type: string
default: "taxi-model"
model_input:
type: uri_folder
test_data:
type: uri_folder
outputs:
predictions:
type: uri_folder
score_report:
type: uri_folder
deploy_flag:
type: uri_folder
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
code: ./src
command: >-
python evaluate.py
--model_name ${{inputs.model_name}}
--model_input ${{inputs.model_input}}
--test_data ${{inputs.test_data}}
--predictions ${{outputs.predictions}}
--score_report ${{outputs.score_report}}
--deploy_flag ${{outputs.deploy_flag}}
# </component>

Просмотреть файл

@ -0,0 +1,142 @@
import argparse
import pandas as pd
import os
from pathlib import Path
from sklearn.linear_model import LinearRegression
import pickle
from sklearn.metrics import mean_squared_error, r2_score
from azureml.core import Run, Experiment, Model
# current run
run = Run.get_context()
ws = run.experiment.workspace
parser = argparse.ArgumentParser("predict")
parser.add_argument("--model_name", type=str, help="Name of registered model")
parser.add_argument("--model_input", type=str, help="Path of input model")
parser.add_argument("--test_data", type=str, help="Path to test data")
parser.add_argument("--predictions", type=str, help="Path of predictions")
parser.add_argument("--score_report", type=str, help="Path to score report")
parser.add_argument('--deploy_flag', type=str, help='A deploy flag whether to deploy or no')
# ---------------- Model Evaluation ---------------- #
args = parser.parse_args()
lines = [
f"Model path: {args.model_input}",
f"Test data path: {args.test_data}",
f"Predictions path: {args.predictions}",
f"Scoring output path: {args.score_report}",
]
for line in lines:
print(line)
# Load the test data
print("mounted_path files: ")
arr = os.listdir(args.test_data)
print(arr)
test_data = pd.read_csv((Path(args.test_data) / "test.csv"))
print(test_data.columns)
testy = test_data["cost"]
# testX = test_data.drop(['cost'], axis=1)
testX = test_data[
[
"distance",
"dropoff_latitude",
"dropoff_longitude",
"passengers",
"pickup_latitude",
"pickup_longitude",
"store_forward",
"vendor",
"pickup_weekday",
"pickup_month",
"pickup_monthday",
"pickup_hour",
"pickup_minute",
"pickup_second",
"dropoff_weekday",
"dropoff_month",
"dropoff_monthday",
"dropoff_hour",
"dropoff_minute",
"dropoff_second",
]
]
print(testX.shape)
print(testX.columns)
# Load the model from input port
model = pickle.load(open((Path(args.model_input) / "model.sav"), "rb"))
# model = (Path(args.model_input) / 'model.txt').read_text()
# print('Model: ', model)
# Compare predictions to actuals (testy)
output_data = testX.copy()
output_data["actual_cost"] = testy
output_data["predicted_cost"] = model.predict(testX)
# Save the output data with feature columns, predicted cost, and actual cost in csv file
output_data.to_csv((Path(args.predictions) / "predictions.csv"))
# Print the results of scoring the predictions against actual values in the test data
# The coefficients
print("Coefficients: \n", model.coef_)
actuals = output_data["actual_cost"]
predictions = output_data["predicted_cost"]
# The mean squared error
print("Mean squared error: %.2f" % mean_squared_error(actuals, predictions))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: %.2f" % r2_score(actuals, predictions))
print("Model: ", model)
# Print score report to a text file
(Path(args.score_report) / "score.txt").write_text(
"Scored with the following model:\n{}".format(model)
)
with open((Path(args.score_report) / "score.txt"), "a") as f:
f.write("\n Coefficients: \n %s \n" % str(model.coef_))
f.write("Mean squared error: %.2f \n" % mean_squared_error(actuals, predictions))
f.write("Coefficient of determination: %.2f \n" % r2_score(actuals, predictions))
# -------------------- Promotion ------------------- #
test_scores = {}
test_predictions = {}
test_score = r2_score(actuals, predictions) # current model
for model_run in Model.list(ws):
if model_run.name == args.model_name:
model_path = Model.download(model_run, exist_ok=True)
mdl = pickle.load(open((Path(model_path)), "rb"))
test_predictions[model_run.id] = mdl.predict(testX)
test_scores[model_run.id] = r2_score(actuals, test_predictions[model_run.id])
print(test_scores)
if test_scores:
if test_score >= max(list(test_scores.values())):
deploy_flag = 1
else:
deploy_flag = 0
else:
deploy_flag = 1
with open((Path(args.deploy_flag) / "deploy_flag"), 'w') as f:
f.write('%d' % int(deploy_flag))
run.log('deploy flag', bool(deploy_flag))
run.parent.log('deploy flag', bool(deploy_flag))
test_scores["current model"] = test_score
model_runs_metrics_plot = pd.DataFrame(test_scores, index=["r2 score"]).plot(kind='bar', figsize=(15, 10))
model_runs_metrics_plot.figure.savefig("model_runs_metrics_plot.png")
model_runs_metrics_plot.figure.savefig(Path(args.score_report) / "model_runs_metrics_plot.png")
run.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
run.parent.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")

Просмотреть файл

@ -0,0 +1,19 @@
# <component>
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: prep_data
display_name: prep-data
version: 1
type: command
inputs:
raw_data:
type: uri_folder
outputs:
transformed_data:
type: uri_folder
code: ./src
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
command: >-
python prep.py
--raw_data ${{inputs.raw_data}}
--transformed_data ${{outputs.transformed_data}}
# </component>

Просмотреть файл

@ -0,0 +1,255 @@
import argparse
from pathlib import Path
from uuid import uuid4
from datetime import datetime
import os
import numpy as np
import pandas as pd
from azureml.core import Run, Model
run = Run.get_context()
ws = run.experiment.workspace
parser = argparse.ArgumentParser("prep")
parser.add_argument("--raw_data", type=str, help="Path to raw data")
parser.add_argument("--transformed_data", type=str, help="Path of prepped data")
args = parser.parse_args()
print("hello training world...")
lines = [
f"Raw data path: {args.raw_data}",
f"Data output path: {args.transformed_data}",
]
for line in lines:
print(line)
# ------------ Reading Data ------------ #
# -------------------------------------- #
print("mounted_path files: ")
arr = os.listdir(args.raw_data)
print(arr)
df_list = []
for filename in arr:
print("reading file: %s ..." % filename)
with open(os.path.join(args.raw_data, filename), "r") as handle:
# print (handle.read())
# ('input_df_%s' % filename) = pd.read_csv((Path(args.training_data) / filename))
input_df = pd.read_csv((Path(args.raw_data) / filename))
df_list.append(input_df)
# Prep the green and yellow taxi data
green_data = df_list[0]
yellow_data = df_list[1]
# ------------ Cleanse Data ------------ #
# -------------------------------------- #
# Define useful columns needed
useful_columns = str(
[
"cost",
"distance",
"dropoff_datetime",
"dropoff_latitude",
"dropoff_longitude",
"passengers",
"pickup_datetime",
"pickup_latitude",
"pickup_longitude",
"store_forward",
"vendor",
]
).replace(",", ";")
print(useful_columns)
# Rename green taxi columns
green_columns = str(
{
"vendorID": "vendor",
"lpepPickupDatetime": "pickup_datetime",
"lpepDropoffDatetime": "dropoff_datetime",
"storeAndFwdFlag": "store_forward",
"pickupLongitude": "pickup_longitude",
"pickupLatitude": "pickup_latitude",
"dropoffLongitude": "dropoff_longitude",
"dropoffLatitude": "dropoff_latitude",
"passengerCount": "passengers",
"fareAmount": "cost",
"tripDistance": "distance",
}
).replace(",", ";")
# Rename yellow taxi columns
yellow_columns = str(
{
"vendorID": "vendor",
"tpepPickupDateTime": "pickup_datetime",
"tpepDropoffDateTime": "dropoff_datetime",
"storeAndFwdFlag": "store_forward",
"startLon": "pickup_longitude",
"startLat": "pickup_latitude",
"endLon": "dropoff_longitude",
"endLat": "dropoff_latitude",
"passengerCount": "passengers",
"fareAmount": "cost",
"tripDistance": "distance",
}
).replace(",", ";")
print("green_columns: " + green_columns)
print("yellow_columns: " + yellow_columns)
# Remove null data
def get_dict(dict_str):
pairs = dict_str.strip("{}").split(";")
new_dict = {}
for pair in pairs:
print(pair)
key, value = pair.strip().split(":")
new_dict[key.strip().strip("'")] = value.strip().strip("'")
return new_dict
def cleanseData(data, columns, useful_columns):
useful_columns = [
s.strip().strip("'") for s in useful_columns.strip("[]").split(";")
]
new_columns = get_dict(columns)
new_df = (data.dropna(how="all").rename(columns=new_columns))[useful_columns]
new_df.reset_index(inplace=True, drop=True)
return new_df
green_data_clean = cleanseData(green_data, green_columns, useful_columns)
yellow_data_clean = cleanseData(yellow_data, yellow_columns, useful_columns)
# Append yellow data to green data
combined_df = green_data_clean.append(yellow_data_clean, ignore_index=True)
combined_df.reset_index(inplace=True, drop=True)
output_green = green_data_clean.to_csv((Path(args.transformed_data) / "green_prep_data.csv"))
output_yellow = yellow_data_clean.to_csv((Path(args.transformed_data) / "yellow_prep_data.csv"))
merged_data = combined_df.to_csv((Path(args.transformed_data) / "merged_data.csv"))
# ------------ Filter Data ------------ #
# ------------------------------------- #
# Filter out coordinates for locations that are outside the city border.
combined_df = combined_df.astype(
{
"pickup_longitude": "float64",
"pickup_latitude": "float64",
"dropoff_longitude": "float64",
"dropoff_latitude": "float64",
}
)
latlong_filtered_df = combined_df[
(combined_df.pickup_longitude <= -73.72)
& (combined_df.pickup_longitude >= -74.09)
& (combined_df.pickup_latitude <= 40.88)
& (combined_df.pickup_latitude >= 40.53)
& (combined_df.dropoff_longitude <= -73.72)
& (combined_df.dropoff_longitude >= -74.72)
& (combined_df.dropoff_latitude <= 40.88)
& (combined_df.dropoff_latitude >= 40.53)
]
latlong_filtered_df.reset_index(inplace=True, drop=True)
# These functions replace undefined values and rename to use meaningful names.
replaced_stfor_vals_df = latlong_filtered_df.replace(
{"store_forward": "0"}, {"store_forward": "N"}
).fillna({"store_forward": "N"})
replaced_distance_vals_df = replaced_stfor_vals_df.replace(
{"distance": ".00"}, {"distance": 0}
).fillna({"distance": 0})
normalized_df = replaced_distance_vals_df.astype({"distance": "float64"})
# Split the pickup and dropoff date further into the day of the week, day of the month, and month values.
temp = pd.DatetimeIndex(normalized_df["pickup_datetime"], dtype="datetime64[ns]")
normalized_df["pickup_date"] = temp.date
normalized_df["pickup_weekday"] = temp.dayofweek
normalized_df["pickup_month"] = temp.month
normalized_df["pickup_monthday"] = temp.day
normalized_df["pickup_time"] = temp.time
normalized_df["pickup_hour"] = temp.hour
normalized_df["pickup_minute"] = temp.minute
normalized_df["pickup_second"] = temp.second
temp = pd.DatetimeIndex(normalized_df["dropoff_datetime"], dtype="datetime64[ns]")
normalized_df["dropoff_date"] = temp.date
normalized_df["dropoff_weekday"] = temp.dayofweek
normalized_df["dropoff_month"] = temp.month
normalized_df["dropoff_monthday"] = temp.day
normalized_df["dropoff_time"] = temp.time
normalized_df["dropoff_hour"] = temp.hour
normalized_df["dropoff_minute"] = temp.minute
normalized_df["dropoff_second"] = temp.second
del normalized_df["pickup_datetime"]
del normalized_df["dropoff_datetime"]
normalized_df.reset_index(inplace=True, drop=True)
print(normalized_df.head)
print(normalized_df.dtypes)
# Drop the pickup_date, dropoff_date, pickup_time, dropoff_time columns because they're
# no longer needed (granular time features like hour,
# minute and second are more useful for model training).
del normalized_df["pickup_date"]
del normalized_df["dropoff_date"]
del normalized_df["pickup_time"]
del normalized_df["dropoff_time"]
# Change the store_forward column to binary values
normalized_df["store_forward"] = np.where((normalized_df.store_forward == "N"), 0, 1)
# Before you package the dataset, run two final filters on the dataset.
# To eliminate incorrectly captured data points,
# filter the dataset on records where both the cost and distance variable values are greater than zero.
final_df = normalized_df[(normalized_df.distance > 0) & (normalized_df.cost > 0)]
final_df.reset_index(inplace=True, drop=True)
print(final_df.head)
# Output data
transformed_data = final_df.to_csv((Path(args.transformed_data) / "transformed_data.csv"))
# Split data into train, val and test datasets
random_data = np.random.rand(len(final_df))
msk_train = random_data < 0.7
msk_val = (random_data >= 0.7) & (random_data < 0.85)
msk_test = random_data >= 0.85
train = final_df[msk_train]
val = final_df[msk_val]
test = final_df[msk_test]
run.log('train size', train.shape[0])
run.log('val size', val.shape[0])
run.log('test size', test.shape[0])
run.parent.log('train size', train.shape[0])
run.parent.log('val size', val.shape[0])
run.parent.log('test size', test.shape[0])
train_data = train.to_csv((Path(args.transformed_data) / "train.csv"))
val_data = val.to_csv((Path(args.transformed_data) / "val.csv"))
test_data = test.to_csv((Path(args.transformed_data) / "test.csv"))

Просмотреть файл

@ -0,0 +1,22 @@
# <component>
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: register_model
version: 1
display_name: register-model
type: command
inputs:
model_name:
type: string
default: "taxi-model"
model_path:
type: uri_folder
deploy_flag:
type: uri_folder
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
code: ./src
command: >-
python register.py
--model_name ${{inputs.model_name}}
--model_path ${{inputs.model_path}}
--deploy_flag ${{inputs.deploy_flag}}
# </component>

Просмотреть файл

@ -0,0 +1,33 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import os
import argparse
from pathlib import Path
from azureml.core import Run, Experiment, Model
parser = argparse.ArgumentParser()
parser.add_argument('--model_name', type=str, help='Name under which model will be registered')
parser.add_argument('--model_path', type=str, help='Model directory')
parser.add_argument('--deploy_flag', type=str, help='A deploy flag whether to deploy or no')
args, _ = parser.parse_known_args()
print(f'Arguments: {args}')
model_name = args.model_name
model_path = args.model_path
with open((Path(args.deploy_flag) / "deploy_flag"), 'r') as f:
deploy_flag = int(f.read())
# current run is the registration step
run = Run.get_context()
ws = run.experiment.workspace
if deploy_flag==1:
print("Registering ", args.model_name)
registered_model = Model.register(model_path=args.model_path,
model_name=args.model_name,
workspace=ws)
print("Registered ", registered_model.id)
else:
print("Model will not be registered!")

Просмотреть файл

@ -0,0 +1,75 @@
import argparse
from pathlib import Path
from uuid import uuid4
from datetime import datetime
import os
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pickle
import mlflow
import mlflow.sklearn
parser = argparse.ArgumentParser("train")
parser.add_argument("--training_data", type=str, help="Path to training data")
parser.add_argument("--model_output", type=str, help="Path of output model")
args = parser.parse_args()
# Enable auto logging
mlflow.sklearn.autolog()
lines = [
f"Training data path: {args.training_data}",
f"Model output path: {args.model_output}",
]
for line in lines:
print(line)
print("mounted_path files: ")
arr = os.listdir(args.training_data)
print(arr)
train_data = pd.read_csv((Path(args.training_data) / "train.csv"))
print(train_data.columns)
# Split the data into input(X) and output(y)
trainy = train_data["cost"]
# X = train_data.drop(['cost'], axis=1)
trainX = train_data[
[
"distance",
"dropoff_latitude",
"dropoff_longitude",
"passengers",
"pickup_latitude",
"pickup_longitude",
"store_forward",
"vendor",
"pickup_weekday",
"pickup_month",
"pickup_monthday",
"pickup_hour",
"pickup_minute",
"pickup_second",
"dropoff_weekday",
"dropoff_month",
"dropoff_monthday",
"dropoff_hour",
"dropoff_minute",
"dropoff_second",
]
]
print(trainX.shape)
print(trainX.columns)
# Train a Linear Regression Model with the train set
model = LinearRegression().fit(trainX, trainy)
perf = model.score(trainX, trainy)
print(perf)
# Output the model and test data
pickle.dump(model, open((Path(args.model_output) / "model.sav"), "wb"))

Просмотреть файл

@ -0,0 +1,19 @@
# <component>
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_model
display_name: train-model
version: 1
type: command
inputs:
training_data:
type: uri_folder
outputs:
model_output:
type: uri_folder
code: ./src
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
command: >-
python train.py
--training_data ${{inputs.training_data}}
--model_output ${{outputs.model_output}}
# </component>

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -0,0 +1,7 @@
# <data>
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: greendata
version: 3
description: sample green taxi dataset
path: ./data
# </data>

Просмотреть файл

@ -0,0 +1,66 @@
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
# <inputs_and_outputs>
inputs:
pipeline_job_input: #using local data, will crate an anonymous data asset
type: uri_folder
path: ./data/
outputs:
pipeline_job_transformed_data:
mode: rw_mount
pipeline_job_trained_model:
mode: rw_mount
pipeline_job_predictions:
mode: rw_mount
pipeline_job_score_report:
mode: rw_mount
pipeline_job_deploy_flag:
type: uri_folder
# </inputs_and_outputs>
# <jobs>
settings:
default_datastore: azureml:workspaceblobstore
default_compute: azureml:cpu-cluster
continue_on_step_failure: false
jobs:
prep-job:
type: command
component: file:./components/prep/prep.yml
inputs:
raw_data: ${{parent.inputs.pipeline_job_input}}
outputs:
transformed_data: ${{parent.outputs.pipeline_job_transformed_data}}
train-job:
type: command
component: file:./components/train/train.yml
inputs:
training_data: ${{parent.jobs.prep-job.outputs.transformed_data}}
outputs:
model_output: ${{parent.outputs.pipeline_job_trained_model}}
evaluate-job:
type: command
component: file:./components/evaluate/evaluate.yml
inputs:
model_name: "taxi-model"
model_input: ${{parent.jobs.train-job.outputs.model_output}}
test_data: ${{parent.jobs.prep-job.outputs.transformed_data}}
outputs:
predictions: ${{parent.outputs.pipeline_job_predictions}}
score_report: ${{parent.outputs.pipeline_job_score_report}}
deploy_flag: ${{parent.outputs.pipeline_job_deploy_flag}}
register-job:
type: command
component: file:./components/register/register.yml
inputs:
model_name: "taxi-model"
model_path: ${{parent.jobs.train-job.outputs.model_output}}
deploy_flag: ${{parent.jobs.evaluate-job.outputs.deploy_flag}}
# </jobs>

Просмотреть файл

@ -0,0 +1,7 @@
# <data>
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
name: yellowdata
version: 3
description: sample yellow taxi dataset
path: ./data
# </data>

Просмотреть файл

@ -0,0 +1,201 @@
import os
import sys
import argparse
import joblib
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from fairlearn.metrics._group_metric_set import _create_group_metric_set
from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id
from interpret_community import TabularExplainer
from azureml.interpret import ExplanationClient
from azureml.core import Run, Model
run = Run.get_context()
ws = run.experiment.workspace
def parse_args():
parser = argparse.ArgumentParser(description="UCI Credit example")
parser.add_argument("--transformed_data_path", type=str, default='transformed_data/', help="Directory path to training data")
parser.add_argument('--model_name', type=str, help='Name under which model is registered')
parser.add_argument("--model_path", type=str, default='trained_model/', help="Model output directory")
parser.add_argument("--explainer_path", type=str, default='trained_model/', help="Model output directory")
parser.add_argument("--evaluation_path", type=str, default='evaluation_results/', help="Evaluation results output directory")
parser.add_argument('--deploy_flag', type=str, help='A deploy flag whether to deploy or no')
return parser.parse_args()
def main():
# Parse command-line arguments
args = parse_args()
transformed_data_path = os.path.join(args.transformed_data_path, run.parent.id)
model_path = os.path.join(args.model_path, run.parent.id)
explainer_path = os.path.join(args.explainer_path, run.parent.id)
evaluation_path = os.path.join(args.evaluation_path, run.parent.id)
# Make sure evaluation output path exists
if not os.path.exists(evaluation_path):
os.makedirs(evaluation_path)
# Make sure explainer output path exists
if not os.path.exists(explainer_path):
os.makedirs(explainer_path)
# Enable auto logging
mlflow.sklearn.autolog()
# Read training & testing data
print(os.path.join(transformed_data_path, 'train.csv'))
train = pd.read_csv(os.path.join(transformed_data_path, 'train.csv'))
train.drop("Sno", axis=1, inplace=True)
y_train = train['Risk']
X_train = train.drop('Risk', axis=1)
test = pd.read_csv(os.path.join(transformed_data_path, 'test.csv'))
test.drop("Sno", axis=1, inplace=True)
y_test = test['Risk']
X_test = test.drop('Risk', axis=1)
run.log('TEST SIZE', test.shape[0])
# Load model
model = joblib.load(os.path.join(model_path, 'model.pkl'))
# ---------------- Model Evaluation ---------------- #
# Evaluate model using testing set
# Capture Accuracy Score
test_acc = model.score(X_test, y_test)
# Capture ML Metrics
test_metrics = {
"Test Accuracy": metrics.accuracy_score(y_test, model.predict(X_test)),
"Test Recall": metrics.recall_score(y_test, model.predict(X_test), pos_label="good"),
"Test Precison": metrics.precision_score(y_test, model.predict(X_test), pos_label="good"),
"Test F1 Score": metrics.f1_score(y_test, model.predict(X_test), pos_label="good")
}
# Capture Confusion Matrix
test_cm = metrics.plot_confusion_matrix(model, X_test, y_test)
# Save and test eval metrics
print("Testing accuracy: %.3f" % test_acc)
run.log('Testing accuracy', test_acc)
run.parent.log('Testing accuracy', test_acc)
with open(os.path.join(evaluation_path, "metrics.json"), 'w+') as f:
json.dump(test_metrics, f)
test_cm.figure_.savefig(os.path.join(evaluation_path, "confusion_matrix.jpg"))
test_cm.figure_.savefig("confusion_matrix.jpg")
run.log_image(name='Confusion Matrix Test Dataset', path="confusion_matrix.jpg")
run.parent.log_image(name='Confusion Matrix Test Dataset', path="confusion_matrix.jpg")
# -------------------- Promotion ------------------- #
test_accuracies = {}
test_predictions = {}
labels_dict = {"good": int(1), "bad": int(0)}
y_test_labels = [labels_dict[x] for x in y_test]
for model_run in Model.list(ws):
if model_run.name == args.model_name:
mdl_path = Model.download(model_run, exist_ok=True)
mdl = joblib.load(os.path.join(mdl_path, 'model.pkl'))
test_accuracies[model_run.id] = mdl.score(X_test, y_test)
test_predictions[model_run.id] = [labels_dict[x] for x in mdl.predict(X_test)]
if test_accuracies:
if test_acc >= max(list(test_accuracies.values())):
deploy_flag = 1
else:
deploy_flag = 0
else:
deploy_flag = 1
with open(args.deploy_flag, 'w') as f:
f.write('%d' % int(deploy_flag))
run.log('deploy flag', bool(deploy_flag))
run.parent.log('deploy flag', bool(deploy_flag))
test_accuracies["current model"] = test_acc
model_runs_metrics_plot = pd.DataFrame(test_accuracies, index=["accuracy"]).plot(kind='bar', figsize=(15, 10))
model_runs_metrics_plot.figure.savefig(os.path.join(evaluation_path, "model_runs_metrics_plot.png"))
model_runs_metrics_plot.figure.savefig("model_runs_metrics_plot.png")
run.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
run.parent.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
# -------------------- FAIRNESS ------------------- #
# Calculate Fairness Metrics over Sensitive Features
# Create a dictionary of model(s) you want to assess for fairness
sensitive_features = ["Sex"]
sf = { col: X_test[[col]] for col in sensitive_features }
test_predictions["currrent model"] = [labels_dict[x] for x in model.predict(X_test)]
dash_dict_all = _create_group_metric_set(y_true=y_test_labels,
predictions=test_predictions,
sensitive_features=sf,
prediction_type='binary_classification',
)
# Upload the dashboard to Azure Machine Learning
dashboard_title = "Fairness insights Comparison of Models"
# Set validate_model_ids parameter of upload_dashboard_dictionary to False if you have not registered your model(s)
upload_id = upload_dashboard_dictionary(run,
dash_dict_all,
dashboard_name=dashboard_title,
validate_model_ids=False)
print("\nUploaded to id: {0}\n".format(upload_id))
upload_id_pipeline = upload_dashboard_dictionary(run.parent,
dash_dict_all,
dashboard_name=dashboard_title,
validate_model_ids=False)
print("\nUploaded to id: {0}\n".format(upload_id_pipeline))
# -------------------- Explainability ------------------- #
tabular_explainer = TabularExplainer(model.steps[-1][1],
initialization_examples=X_train,
features=X_train.columns,
classes=[0, 1],
transformations=model.steps[0][1])
joblib.dump(tabular_explainer, os.path.join(explainer_path, "explainer"))
# you can use the training data or the test data here, but test data would allow you to use Explanation Exploration
global_explanation = tabular_explainer.explain_global(X_test)
# if the PFIExplainer in the previous step, use the next line of code instead
# global_explanation = explainer.explain_global(x_train, true_labels=y_train)
# sorted feature importance values and feature names
sorted_global_importance_values = global_explanation.get_ranked_global_values()
sorted_global_importance_names = global_explanation.get_ranked_global_names()
print("Explainability feature importance:")
# alternatively, you can print out a dictionary that holds the top K feature names and values
global_explanation.get_feature_importance_dict()
client = ExplanationClient.from_run(run)
client.upload_model_explanation(global_explanation, comment='global explanation: all features')
# upload dashboard to parent run
client_parent = ExplanationClient.from_run(run.parent)
client_parent.upload_model_explanation(global_explanation, comment='global explanation: all features')
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,59 @@
import os
import glob
import json
import argparse
import numpy as np
import pandas as pd
import joblib
from azureml.core.model import Model
model = None
explainer = None
def init():
global model, explainer
print("Started batch scoring by running init()")
parser = argparse.ArgumentParser('batch_scoring')
parser.add_argument('--model_name', type=str, help='Model to use for batch scoring')
args, _ = parser.parse_known_args()
model_path = Model.get_model_path(args.model_name)
print(f"Model path: {model_path}")
model = joblib.load(model_path)
# load the explainer
explainer_path = os.path.join(Model.get_model_path(args.model_name), "explainer")
#explainer = joblib.load(explainer_path)
def run(file_list):
print(f"Files to process: {file_list}")
results = pd.DataFrame(columns=["Sno", "ProbaGoodCredit", "ProbaBadCredit", "FeatureImportance"])
for filename in file_list:
df = pd.read_csv(filename)
sno = df["Sno"]
df = df.drop("Sno", axis=1)
proba = model.predict_proba(df)
proba = pd.DataFrame(data=proba, columns=["ProbaGoodCredit", "ProbaBadCredit"])
#explanation = explainer.explain_local(df)
# sorted feature importance values and feature names
#sorted_local_importance_names = explanation.get_ranked_local_names()
#sorted_local_importance_values = explanation.get_ranked_local_values()
# get explanations in dictionnary
#explanations = []
#for i, j in zip(sorted_local_importance_names[0], sorted_local_importance_values[0]):
# explanations.append(dict(zip(i, j)))
#explanation = pd.DataFrame(data=explanations, columns=["FeatureImportance"])
#result = pd.concat([sno, proba, explanation], axis=1)
result = pd.concat([sno, proba], axis=1)
results = results.append(result)
print(f"Batch scored: {filename}")
return results

Просмотреть файл

@ -0,0 +1,106 @@
import os
import sys
import argparse
import joblib
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from azureml.core import Run
run = Run.get_context()
ws = run.experiment.workspace
def parse_args():
parser = argparse.ArgumentParser(description="UCI Credit example")
parser.add_argument("--transformed_data_path", type=str, default='transformed_data/', help="Directory path to training data")
parser.add_argument("--model_path", type=str, default='trained_model/', help="Model output directory")
return parser.parse_args()
def main():
# Parse command-line arguments
args = parse_args()
transformed_data_path = os.path.join(args.transformed_data_path, run.parent.id)
model_path = os.path.join(args.model_path, run.parent.id)
# Make sure model output path exists
if not os.path.exists(model_path):
os.makedirs(model_path)
# Enable auto logging
mlflow.sklearn.autolog()
# Read training data
print(os.path.join(transformed_data_path, 'train.csv'))
train = pd.read_csv(os.path.join(transformed_data_path, 'train.csv'))
val = pd.read_csv(os.path.join(transformed_data_path, 'val.csv'))
run.log('TRAIN SIZE', train.shape[0])
run.log('VAL SIZE', val.shape[0])
# Train model
model = model_train(train, val)
#copying model to "outputs" directory, this will automatically upload it to Azure ML
joblib.dump(value=model, filename=os.path.join(model_path, 'model.pkl'))
def model_train(train, val):
train.drop("Sno", axis=1, inplace=True)
val.drop("Sno", axis=1, inplace=True)
y_train = train['Risk']
X_train = train.drop('Risk', axis=1)
y_val = val['Risk']
X_val = val.drop('Risk', axis=1)
categorical_features = X_train.select_dtypes(include=['object']).columns
numeric_features = X_train.select_dtypes(include=['int64', 'float']).columns
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value="missing")),
('onehotencoder', OneHotEncoder(categories='auto', sparse=False))])
numeric_transformer = Pipeline(steps=[
('scaler', StandardScaler())])
feature_engineering_pipeline = ColumnTransformer(
transformers=[
('numeric', numeric_transformer, numeric_features),
('categorical', categorical_transformer, categorical_features)
], remainder="drop")
# Encode Labels
le = LabelEncoder()
encoded_y = le.fit_transform(y_train)
# Create sklearn pipeline
lr_clf = Pipeline(steps=[('preprocessor', feature_engineering_pipeline),
('classifier', LogisticRegression(solver="lbfgs"))])
# Train the model
lr_clf.fit(X_train, y_train)
# Capture metrics
train_acc = lr_clf.score(X_train, y_train)
val_acc = lr_clf.score(X_val, y_val)
print("Training accuracy: %.3f" % train_acc)
print("Validation accuracy: %.3f" % val_acc)
run.log('Training accuracy', train_acc)
run.log('Validation accuracy', val_acc)
return lr_clf
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,66 @@
import os
import sys
import argparse
import joblib
import pandas as pd
import numpy as np
import mlflow
import mlflow.sklearn
from azureml.core import Run
import argparse
run = Run.get_context()
ws = run.experiment.workspace
def parse_args():
parser = argparse.ArgumentParser(description="UCI Credit example")
parser.add_argument("--data_path", type=str, default='data/', help="Directory path to training data")
parser.add_argument("--transformed_data_path", type=str, default='transformed_data/', help="transformed data directory")
return parser.parse_args()
def main():
# Parse command-line arguments
args = parse_args()
transformed_data_path = os.path.join(args.transformed_data_path, run.parent.id)
# Make sure data output path exists
if not os.path.exists(transformed_data_path):
os.makedirs(transformed_data_path)
# Enable auto logging
mlflow.sklearn.autolog()
# Read training data
df = pd.read_csv(os.path.join(args.data_path, 'credit.csv'))
random_data = np.random.rand(len(df))
msk_train = random_data < 0.7
msk_val = (random_data >= 0.7) & (random_data < 0.85)
msk_test = random_data >= 0.85
train = df[msk_train]
val = df[msk_val]
test = df[msk_test]
run.log('TRAIN SIZE', train.shape[0])
run.log('VAL SIZE', val.shape[0])
run.log('TEST SIZE', test.shape[0])
run.parent.log('TRAIN SIZE', train.shape[0])
run.parent.log('VAL SIZE', val.shape[0])
run.parent.log('TEST SIZE', test.shape[0])
TRAIN_PATH = os.path.join(transformed_data_path, "train.csv")
VAL_PATH = os.path.join(transformed_data_path, "val.csv")
TEST_PATH = os.path.join(transformed_data_path, "test.csv")
train.to_csv(TRAIN_PATH, index=False)
val.to_csv(VAL_PATH, index=False)
test.to_csv(TEST_PATH, index=False)
if __name__ == '__main__':
main()

Двоичные данные
data/managed-endpoints/models/sklearn_regression_model.pkl Normal file

Двоичный файл не отображается.

Просмотреть файл

@ -0,0 +1,4 @@
{"data": [
[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]
]}

Разница между файлами не показана из-за своего большого размера Загрузить разницу

1001
data/training/credit.csv Normal file

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Двоичные данные
images/iacpipelineresult.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 23 KiB

Просмотреть файл

Просмотреть файл

Просмотреть файл

@ -0,0 +1,93 @@
targetScope='subscription'
param location string = 'westus2'
param env string = 'dev'
param prefix string
param postfix string
param resourceGroupName string = 'rg-wus-test'
var baseName = '${prefix}${postfix}'
resource resgrp 'Microsoft.Resources/resourceGroups@2020-06-01' = {
name: resourceGroupName
location: location
}
// storage account
module stoacct './modules/stoacct.bicep' = {
name: 'stoacct'
scope: resourceGroup(resgrp.name)
params: {
env: env
baseName: baseName
location: location
}
}
// keyvault
module kv './modules/kv.bicep' = {
name: 'kv'
scope: resourceGroup(resgrp.name)
params:{
env: env
location: location
baseName: baseName
}
}
// appinsights
module appinsight './modules/appinsight.bicep' = {
name: 'appinsight'
scope: resourceGroup(resgrp.name)
params:{
baseName: baseName
env: env
location: location
}
}
// container registry
module cr './modules/cr.bicep' = {
name: 'cr'
scope: resourceGroup(resgrp.name)
params:{
baseName: baseName
env: env
location: location
}
}
// amls workspace
module amls './modules/amls.bicep' = {
name: 'amls'
scope: resourceGroup(resgrp.name)
params:{
baseName: baseName
env: env
location: location
stoacctid: stoacct.outputs.stoacctOut
kvid: kv.outputs.kvOut
appinsightid: appinsight.outputs.appinsightOut
crid: cr.outputs.crOut
}
}
// aml compute instance
module amlci './modules/amlcomputeinstance.bicep' = {
name: 'amlci'
scope: resourceGroup(resgrp.name)
params:{
baseName: baseName
env: env
location: location
workspaceName: amls.outputs.amlsName
}
}

Просмотреть файл

@ -0,0 +1,516 @@
{
"$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "11208633954998583577"
}
},
"parameters": {
"location": {
"type": "string",
"defaultValue": "westus2"
},
"env": {
"type": "string",
"defaultValue": "dev"
},
"prefix": {
"type": "string"
},
"postfix": {
"type": "string"
},
"resourceGroupName": {
"type": "string",
"defaultValue": "rg-wus-test"
}
},
"variables": {
"baseName": "[format('{0}{1}', parameters('prefix'), parameters('postfix'))]"
},
"resources": [
{
"type": "Microsoft.Resources/resourceGroups",
"apiVersion": "2020-06-01",
"name": "[parameters('resourceGroupName')]",
"location": "[parameters('location')]"
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "stoacct",
"resourceGroup": "[parameters('resourceGroupName')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {
"env": {
"value": "[parameters('env')]"
},
"baseName": {
"value": "[variables('baseName')]"
},
"location": {
"value": "[parameters('location')]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "13854706444404712543"
}
},
"parameters": {
"env": {
"type": "string"
},
"baseName": {
"type": "string"
},
"location": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"apiVersion": "2019-04-01",
"name": "[format('{0}{1}sa', parameters('env'), parameters('baseName'))]",
"location": "[parameters('location')]",
"sku": {
"name": "Standard_LRS"
},
"kind": "StorageV2",
"properties": {
"encryption": {
"services": {
"blob": {
"enabled": true
},
"file": {
"enabled": true
}
},
"keySource": "Microsoft.Storage"
},
"supportsHttpsTrafficOnly": true
}
}
],
"outputs": {
"stoacctOut": {
"type": "string",
"value": "[resourceId('Microsoft.Storage/storageAccounts', format('{0}{1}sa', parameters('env'), parameters('baseName')))]"
}
}
}
},
"dependsOn": [
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
]
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "kv",
"resourceGroup": "[parameters('resourceGroupName')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {
"env": {
"value": "[parameters('env')]"
},
"location": {
"value": "[parameters('location')]"
},
"baseName": {
"value": "[variables('baseName')]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "3960831692549416869"
}
},
"parameters": {
"baseName": {
"type": "string"
},
"env": {
"type": "string"
},
"location": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.KeyVault/vaults",
"apiVersion": "2019-09-01",
"name": "[format('{0}-{1}-kv', parameters('env'), parameters('baseName'))]",
"location": "[parameters('location')]",
"properties": {
"tenantId": "[subscription().tenantId]",
"sku": {
"name": "standard",
"family": "A"
},
"accessPolicies": []
}
}
],
"outputs": {
"kvOut": {
"type": "string",
"value": "[resourceId('Microsoft.KeyVault/vaults', format('{0}-{1}-kv', parameters('env'), parameters('baseName')))]"
}
}
}
},
"dependsOn": [
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
]
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "appinsight",
"resourceGroup": "[parameters('resourceGroupName')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {
"baseName": {
"value": "[variables('baseName')]"
},
"env": {
"value": "[parameters('env')]"
},
"location": {
"value": "[parameters('location')]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "2591061638125956638"
}
},
"parameters": {
"baseName": {
"type": "string"
},
"env": {
"type": "string"
},
"location": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.Insights/components",
"apiVersion": "2020-02-02-preview",
"name": "[format('{0}{1}-appin', parameters('env'), parameters('baseName'))]",
"location": "[parameters('location')]",
"kind": "web",
"properties": {
"Application_Type": "web"
}
}
],
"outputs": {
"appinsightOut": {
"type": "string",
"value": "[resourceId('Microsoft.Insights/components', format('{0}{1}-appin', parameters('env'), parameters('baseName')))]"
}
}
}
},
"dependsOn": [
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
]
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "cr",
"resourceGroup": "[parameters('resourceGroupName')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {
"baseName": {
"value": "[variables('baseName')]"
},
"env": {
"value": "[parameters('env')]"
},
"location": {
"value": "[parameters('location')]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "12155558635582316098"
}
},
"parameters": {
"env": {
"type": "string"
},
"baseName": {
"type": "string"
},
"location": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.ContainerRegistry/registries",
"apiVersion": "2020-11-01-preview",
"name": "[format('{0}{1}cr', parameters('env'), parameters('baseName'))]",
"location": "[parameters('location')]",
"sku": {
"name": "Standard"
},
"properties": {
"adminUserEnabled": true
}
}
],
"outputs": {
"crOut": {
"type": "string",
"value": "[resourceId('Microsoft.ContainerRegistry/registries', format('{0}{1}cr', parameters('env'), parameters('baseName')))]"
}
}
}
},
"dependsOn": [
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
]
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "amls",
"resourceGroup": "[parameters('resourceGroupName')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {
"baseName": {
"value": "[variables('baseName')]"
},
"env": {
"value": "[parameters('env')]"
},
"location": {
"value": "[parameters('location')]"
},
"stoacctid": {
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'stoacct')).outputs.stoacctOut.value]"
},
"kvid": {
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'kv')).outputs.kvOut.value]"
},
"appinsightid": {
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'appinsight')).outputs.appinsightOut.value]"
},
"crid": {
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'cr')).outputs.crOut.value]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "18023230433604735324"
}
},
"parameters": {
"location": {
"type": "string"
},
"baseName": {
"type": "string"
},
"env": {
"type": "string"
},
"stoacctid": {
"type": "string"
},
"kvid": {
"type": "string"
},
"appinsightid": {
"type": "string"
},
"crid": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.MachineLearningServices/workspaces",
"apiVersion": "2020-09-01-preview",
"name": "[format('{0}{1}-ws', parameters('env'), parameters('baseName'))]",
"location": "[parameters('location')]",
"identity": {
"type": "SystemAssigned"
},
"sku": {
"tier": "basic",
"name": "basic"
},
"properties": {
"friendlyName": "[format('{0}{1}-ws', parameters('env'), parameters('baseName'))]",
"storageAccount": "[parameters('stoacctid')]",
"keyVault": "[parameters('kvid')]",
"applicationInsights": "[parameters('appinsightid')]",
"containerRegistry": "[parameters('crid')]",
"encryption": {
"status": "Disabled",
"keyVaultProperties": {
"keyIdentifier": "",
"keyVaultArmId": ""
}
}
}
}
],
"outputs": {
"amlsName": {
"type": "string",
"value": "[format('{0}{1}-ws', parameters('env'), parameters('baseName'))]"
}
}
}
},
"dependsOn": [
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'appinsight')]",
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'cr')]",
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'kv')]",
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]",
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'stoacct')]"
]
},
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2020-10-01",
"name": "amlci",
"resourceGroup": "[parameters('resourceGroupName')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "inner"
},
"mode": "Incremental",
"parameters": {
"baseName": {
"value": "[variables('baseName')]"
},
"env": {
"value": "[parameters('env')]"
},
"location": {
"value": "[parameters('location')]"
},
"workspaceName": {
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'amls')).outputs.amlsName.value]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.5.6.12127",
"templateHash": "2016431671585526523"
}
},
"parameters": {
"location": {
"type": "string"
},
"baseName": {
"type": "string"
},
"env": {
"type": "string"
},
"computeInstanceName": {
"type": "string",
"defaultValue": "[format('{0}-{1}-ci', parameters('env'), parameters('baseName'))]"
},
"workspaceName": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.MachineLearningServices/workspaces/computes",
"apiVersion": "2020-09-01-preview",
"name": "[format('{0}/{1}', parameters('workspaceName'), parameters('computeInstanceName'))]",
"location": "[parameters('location')]",
"properties": {
"computeType": "AmlCompute",
"properties": {
"vmSize": "Standard_DS3_v2",
"subnet": "[json('null')]",
"osType": "Linux",
"scaleSettings": {
"maxNodeCount": 4,
"minNodeCount": 0
}
}
}
}
]
}
},
"dependsOn": [
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'amls')]",
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
]
}
]
}

Просмотреть файл

@ -0,0 +1,25 @@
param location string
param baseName string
param env string
param computeInstanceName string = '${env}-${baseName}-ci'
param workspaceName string
resource amlci 'Microsoft.MachineLearningServices/workspaces/computes@2020-09-01-preview' = {
name: '${workspaceName}/${computeInstanceName}'
location: location
properties:{
computeType: 'AmlCompute'
properties:{
vmSize: 'Standard_DS3_v2'
subnet: json('null')
osType:'Linux'
scaleSettings:{
maxNodeCount: 4
minNodeCount: 0
}
}
}
}

Просмотреть файл

@ -0,0 +1,39 @@
param location string
param baseName string
param env string
param stoacctid string
param kvid string
param appinsightid string
param crid string
// azure machine learning service
resource amls 'Microsoft.MachineLearningServices/workspaces@2020-09-01-preview' = {
name: '${env}${baseName}-ws'
location: location
identity: {
type: 'SystemAssigned'
}
sku:{
tier: 'basic'
name: 'basic'
}
properties:{
friendlyName: '${env}${baseName}-ws'
storageAccount: stoacctid
keyVault: kvid
applicationInsights: appinsightid
containerRegistry: crid
encryption:{
status: 'Disabled'
keyVaultProperties:{
keyIdentifier: ''
keyVaultArmId: ''
}
}
}
}
output amlsName string = amls.name

Просмотреть файл

@ -0,0 +1,16 @@
param baseName string
param env string
param location string
// app insights
resource appinsight 'Microsoft.Insights/components@2020-02-02-preview' = {
name: '${env}${baseName}-appin'
location: location
kind: 'web'
properties:{
Application_Type: 'web'
}
}
output appinsightOut string = appinsight.id

Просмотреть файл

@ -0,0 +1,17 @@
param env string
param baseName string
param location string
resource cr 'Microsoft.ContainerRegistry/registries@2020-11-01-preview' = {
name: '${env}${baseName}cr'
location: location
sku: {
name: 'Standard'
}
properties:{
adminUserEnabled:true
}
}
output crOut string = cr.id

Просмотреть файл

@ -0,0 +1,20 @@
param baseName string
param env string
param location string
// keyvault
resource kv 'Microsoft.KeyVault/vaults@2019-09-01' = {
name: '${env}-${baseName}-kv'
location: location
properties:{
tenantId: subscription().tenantId
sku: {
name: 'standard'
family: 'A'
}
accessPolicies: []
}
}
output kvOut string = kv.id

Просмотреть файл

@ -0,0 +1,30 @@
param env string
param baseName string
param location string
// stroage account
resource stoacct 'Microsoft.Storage/storageAccounts@2019-04-01' = {
name: '${env}${baseName}sa'
location: location
sku:{
name:'Standard_LRS'
}
kind: 'StorageV2'
properties:{
encryption:{
services:{
blob:{
enabled: true
}
file:{
enabled: true
}
}
keySource: 'Microsoft.Storage'
}
supportsHttpsTrafficOnly: true
}
}
output stoacctOut string = stoacct.id

Просмотреть файл

@ -0,0 +1,34 @@
variables:
- template: ../../../config-aml.yml
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
# 'main' branch: PRD environment
- template: ../../../config-infra-prod.yml
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
# 'develop' or feature branches: DEV environment
- template: ../../../config-infra-dev.yml
trigger:
- none
pool:
vmImage: $(ap_vm_image)
stages :
- stage: CheckOutBicepAndDeploy
displayName: Deploy AML Workspace
jobs:
- job: DeployBicep
displayName: Create Bicep Deployment
steps:
- checkout: self
- task: AzureCLI@2
displayName: Running Deployment
inputs:
azureSubscription: $(ado_service_connection_rg)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az --version
echo "deploying bicep..."
az deployment sub create --name $(Build.DefinitionName) --location $(location) --template-file ./infrastructure/bicep/main.bicep --parameters location=$(location) resourceGroupName=$(resource_group) prefix=$(namespace) postfix=$(postfix)

Просмотреть файл

@ -0,0 +1,94 @@
# Resource group
module "resource_group" {
source = "./modules/resource-group"
location = var.location
prefix = var.prefix
postfix = var.postfix
tags = local.tags
}
# Azure Machine Learning workspace
module "aml_workspace" {
source = "./modules/aml-workspace"
rg_name = module.resource_group.name
location = module.resource_group.location
prefix = var.prefix
postfix = var.postfix
storage_account_id = module.storage_account_aml.id
key_vault_id = module.key_vault.id
application_insights_id = module.application_insights.id
container_registry_id = module.container_registry.id
enable_aml_computecluster = var.enable_aml_computecluster
storage_account_name = module.storage_account_aml.name
tags = local.tags
}
# Storage account
module "storage_account_aml" {
source = "./modules/storage-account"
rg_name = module.resource_group.name
location = module.resource_group.location
prefix = var.prefix
postfix = "${var.postfix}aml"
hns_enabled = false
firewall_bypass = ["AzureServices"]
firewall_virtual_network_subnet_ids = []
tags = local.tags
}
# Key vault
module "key_vault" {
source = "./modules/key-vault"
rg_name = module.resource_group.name
location = module.resource_group.location
prefix = var.prefix
postfix = var.postfix
tags = local.tags
}
# Application insights
module "application_insights" {
source = "./modules/application-insights"
rg_name = module.resource_group.name
location = module.resource_group.location
prefix = var.prefix
postfix = var.postfix
tags = local.tags
}
# Container registry
module "container_registry" {
source = "./modules/container-registry"
rg_name = module.resource_group.name
location = module.resource_group.location
prefix = var.prefix
postfix = var.postfix
tags = local.tags
}

Просмотреть файл

@ -0,0 +1,9 @@
locals {
tags = {
Owner = "mlops-tabular"
Project = "mlops-tabular"
Environment = "${var.environment}"
Toolkit = "Terraform"
Name = "${var.prefix}"
}
}

Просмотреть файл

@ -0,0 +1,18 @@
terraform {
backend "azurerm" {}
required_providers {
azurerm = {
version = "= 2.99.0"
}
}
}
provider "azurerm" {
features {}
}
data "azurerm_client_config" "current" {}
data "http" "ip" {
url = "https://ifconfig.me"
}

Просмотреть файл

@ -0,0 +1,104 @@
resource "azurerm_machine_learning_workspace" "adl_mlw" {
name = "mlw-${var.prefix}-${var.postfix}"
location = var.location
resource_group_name = var.rg_name
application_insights_id = var.application_insights_id
key_vault_id = var.key_vault_id
storage_account_id = var.storage_account_id
container_registry_id = var.container_registry_id
identity {
type = "SystemAssigned"
}
tags = var.tags
}
# Compute cluster
resource "azurerm_machine_learning_compute_cluster" "adl_aml_ws_compute_cluster" {
name = "mlwcc${var.prefix}${var.postfix}"
location = var.location
vm_priority = "LowPriority"
vm_size = "STANDARD_DS2_V2"
machine_learning_workspace_id = azurerm_machine_learning_workspace.adl_mlw.id
count = var.enable_aml_computecluster ? 1 : 0
scale_settings {
min_node_count = 0
max_node_count = 1
scale_down_nodes_after_idle_duration = "PT120S" # 120 seconds
}
identity {
type = "SystemAssigned"
}
}
# Datastore
resource "azurerm_resource_group_template_deployment" "arm_aml_create_datastore" {
name = "arm_aml_create_datastore"
resource_group_name = var.rg_name
deployment_mode = "Incremental"
parameters_content = jsonencode({
"WorkspaceName" = {
value = azurerm_machine_learning_workspace.adl_mlw.name
},
"StorageAccountName" = {
value = var.storage_account_name
}
})
depends_on = [time_sleep.wait_30_seconds]
template_content = <<TEMPLATE
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"WorkspaceName": {
"type": "String"
},
"StorageAccountName": {
"type": "String"
}
},
"resources": [
{
"type": "Microsoft.MachineLearningServices/workspaces/datastores",
"apiVersion": "2021-03-01-preview",
"name": "[concat(parameters('WorkspaceName'), '/default')]",
"dependsOn": [],
"properties": {
"contents": {
"accountName": "[parameters('StorageAccountName')]",
"containerName": "default",
"contentsType": "AzureBlob",
"credentials": {
"credentialsType": "None"
},
"endpoint": "core.windows.net",
"protocol": "https"
},
"description": "Default datastore for mlops-tabular",
"isDefault": false,
"properties": {
"ServiceDataAccessAuthIdentity": "None"
},
"tags": {}
}
}
]
}
TEMPLATE
}
resource "time_sleep" "wait_30_seconds" {
depends_on = [
azurerm_machine_learning_workspace.adl_mlw
]
create_duration = "30s"
}

Просмотреть файл

@ -0,0 +1,55 @@
variable "rg_name" {
type = string
description = "Resource group name"
}
variable "location" {
type = string
description = "Location of the resource group"
}
variable "tags" {
type = map(string)
default = {}
description = "A mapping of tags which should be assigned to the deployed resource"
}
variable "prefix" {
type = string
description = "Prefix for the module name"
}
variable "postfix" {
type = string
description = "Postfix for the module name"
}
variable "storage_account_id" {
type = string
description = "The ID of the Storage Account linked to AML workspace"
}
variable "key_vault_id" {
type = string
description = "The ID of the Key Vault linked to AML workspace"
}
variable "application_insights_id" {
type = string
description = "The ID of the Application Insights linked to AML workspace"
}
variable "container_registry_id" {
type = string
description = "The ID of the Container Registry linked to AML workspace"
}
variable "enable_aml_computecluster" {
description = "Variable to enable or disable AML compute cluster"
default = false
}
variable "storage_account_name" {
type = string
description = "The Name of the Storage Account linked to AML workspace"
}

Просмотреть файл

@ -0,0 +1,8 @@
resource "azurerm_application_insights" "adl_appi" {
name = "appi-${var.prefix}-${var.postfix}"
location = var.location
resource_group_name = var.rg_name
application_type = "web"
tags = var.tags
}

Просмотреть файл

@ -0,0 +1,3 @@
output "id" {
value = azurerm_application_insights.adl_appi.id
}

Просмотреть файл

@ -0,0 +1,25 @@
variable "rg_name" {
type = string
description = "Resource group name"
}
variable "location" {
type = string
description = "Location of the resource group"
}
variable "tags" {
type = map(string)
default = {}
description = "A mapping of tags which should be assigned to the deployed resource"
}
variable "prefix" {
type = string
description = "Prefix for the module name"
}
variable "postfix" {
type = string
description = "Postfix for the module name"
}

Просмотреть файл

@ -0,0 +1,20 @@
locals {
safe_prefix = replace(var.prefix, "-", "")
safe_postfix = replace(var.postfix, "-", "")
}
resource "azurerm_container_registry" "adl_cr" {
name = "cr${local.safe_prefix}${local.safe_postfix}"
resource_group_name = var.rg_name
location = var.location
sku = "Premium"
admin_enabled = false
network_rule_set {
default_action = "Deny"
ip_rule = []
virtual_network = []
}
tags = var.tags
}

Просмотреть файл

@ -0,0 +1,3 @@
output "id" {
value = azurerm_container_registry.adl_cr.id
}

Просмотреть файл

@ -0,0 +1,25 @@
variable "rg_name" {
type = string
description = "Resource group name"
}
variable "location" {
type = string
description = "Location of the resource group"
}
variable "tags" {
type = map(string)
default = {}
description = "A mapping of tags which should be assigned to the deployed resource"
}
variable "prefix" {
type = string
description = "Prefix for the module name"
}
variable "postfix" {
type = string
description = "Postfix for the module name"
}

Просмотреть файл

@ -0,0 +1,18 @@
data "azurerm_client_config" "current" {}
resource "azurerm_key_vault" "adl_kv" {
name = "kv-${var.prefix}-${var.postfix}"
location = var.location
resource_group_name = var.rg_name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
network_acls {
default_action = "Deny"
ip_rules = []
virtual_network_subnet_ids = []
bypass = "None"
}
tags = var.tags
}

Просмотреть файл

@ -0,0 +1,7 @@
output "id" {
value = azurerm_key_vault.adl_kv.id
}
output "name" {
value = azurerm_key_vault.adl_kv.name
}

Просмотреть файл

@ -0,0 +1,25 @@
variable "rg_name" {
type = string
description = "Resource group name"
}
variable "location" {
type = string
description = "Location of the resource group"
}
variable "tags" {
type = map(string)
default = {}
description = "A mapping of tags which should be assigned to the deployed resource"
}
variable "prefix" {
type = string
description = "Prefix for the module name"
}
variable "postfix" {
type = string
description = "Postfix for the module name"
}

Просмотреть файл

@ -0,0 +1,5 @@
resource "azurerm_resource_group" "adl_rg" {
name = "rg-${var.prefix}-${var.postfix}"
location = var.location
tags = var.tags
}

Просмотреть файл

@ -0,0 +1,7 @@
output "name" {
value = azurerm_resource_group.adl_rg.name
}
output "location" {
value = azurerm_resource_group.adl_rg.location
}

Просмотреть файл

@ -0,0 +1,21 @@
variable "location" {
type = string
default = "North Europe"
description = "Location of the Resource Group"
}
variable "tags" {
type = map(string)
default = {}
description = "A mapping of tags which should be assigned to the Resource Group"
}
variable "prefix" {
type = string
description = "Prefix for the module name"
}
variable "postfix" {
type = string
description = "Postfix for the module name"
}

Просмотреть файл

@ -0,0 +1,34 @@
data "azurerm_client_config" "current" {}
data "http" "ip" {
url = "https://ifconfig.me"
}
locals {
safe_prefix = replace(var.prefix, "-", "")
safe_postfix = replace(var.postfix, "-", "")
}
resource "azurerm_storage_account" "adl_st" {
name = "st${local.safe_prefix}${local.safe_postfix}"
resource_group_name = var.rg_name
location = var.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = var.hns_enabled
tags = var.tags
}
# Virtual Network & Firewall configuration
resource "azurerm_storage_account_network_rules" "firewall_rules" {
resource_group_name = var.rg_name
storage_account_name = azurerm_storage_account.adl_st.name
default_action = "Allow"
ip_rules = [] # [data.http.ip.body]
virtual_network_subnet_ids = var.firewall_virtual_network_subnet_ids
bypass = var.firewall_bypass
}

Просмотреть файл

@ -0,0 +1,7 @@
output "id" {
value = azurerm_storage_account.adl_st.id
}
output "name" {
value = azurerm_storage_account.adl_st.name
}

Просмотреть файл

@ -0,0 +1,39 @@
variable "rg_name" {
type = string
description = "Resource group name"
}
variable "location" {
type = string
description = "Location of the resource group"
}
variable "tags" {
type = map(string)
default = {}
description = "A mapping of tags which should be assigned to the Resource Group"
}
variable "prefix" {
type = string
description = "Prefix for the module name"
}
variable "postfix" {
type = string
description = "Postfix for the module name"
}
variable "hns_enabled" {
type = bool
description = "Hierarchical namespaces enabled/disabled"
default = true
}
variable "firewall_virtual_network_subnet_ids" {
default = []
}
variable "firewall_bypass" {
default = ["None"]
}

Просмотреть файл

@ -0,0 +1,55 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
variables:
- template: ../../../config-aml.yml
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
# 'main' branch: PRD environment
- template: ../../../config-infra-prod.yml
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
# 'develop' or feature branches: DEV environment
- template: ../../../config-infra-dev.yml
trigger:
- none
pool:
vmImage: $(ap_vm_image)
resources:
repositories:
- repository: mlops-templates
name: Azure/mlops-templates
endpoint: mlops-v2-tabular
type: github
ref: main #branch name
stages :
- stage: CreateStorageAccountForTerraformState
displayName: Create Storage for Terraform
jobs:
- job: CreateStorageForTerraform
displayName: Create Storage for Terraform
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/infra/create-resource-group.yml@mlops-templates
- template: templates/infra/create-storage-account.yml@mlops-templates
- template: templates/infra/create-storage-container.yml@mlops-templates
- stage: DeployAzureMachineLearningRG
displayName: Deploy AML Resource Group
jobs:
- job: DeployAMLWorkspace
displayName: 'Deploy AML Workspace'
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/infra/install-terraform.yml@mlops-templates
- template: templates/infra/run-terraform-init.yml@mlops-templates
- template: templates/infra/run-terraform-validate.yml@mlops-templates
- template: templates/infra/run-terraform-plan.yml@mlops-templates
- template: templates/infra/run-terraform-apply.yml@mlops-templates

Просмотреть файл

@ -0,0 +1,23 @@
variable "location" {
type = string
description = "Location of the resource group and modules"
}
variable "prefix" {
type = string
description = "Prefix for module names"
}
variable "environment" {
type = string
description = "Environment information"
}
variable "postfix" {
type = string
description = "Postfix for module names"
}
variable "enable_aml_computecluster" {
description = "Variable to enable or disable AML compute cluster"
}

Просмотреть файл

@ -0,0 +1,78 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
variables:
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
# 'main' branch: PRD environment
- template: ../../../config-infra-prod.yml
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
# 'develop' or feature branches: DEV environment
- template: ../../../config-infra-dev.yml
- name: version
value: aml-cli-v2 #must be either 'python-sdk' or 'aml-cli-v2'
- name: endpoint_name
value: batchendpoint1
- name: endpoint_type
value: batch
trigger:
- none
pool:
vmImage: ubuntu-20.04
resources:
repositories:
- repository: mlops-templates # Template Repo
name: Azure/mlops-templates # need to change org name from "Azure" to your own org
endpoint: mlops-v2-service-connection # need to hardcode as repositories doesn't accept variables
type: github
stages:
- stage: DeployTrainingPipeline
displayName: Deploy Training Pipeline
jobs:
- job: DeployTrainingPipeline
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates
parameters:
pipeline_file: data-science-regression/pipeline.yml
condition: eq(variables['build.sourceBranchName'], 'main') # Selective skipping based on branch; remove this line before release!
- stage: CreateBatchEndpoint
displayName: Create/Update Batch Endpoint
jobs:
- job: DeployBatchEndpoint
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/create-compute.yml@mlops-templates
parameters:
cluster_name: batch-cluster #must match cluster name in deployment file below
min_instances: 0
max_instances: 5
- template: templates/${{ variables.version }}/create-endpoint.yml@mlops-templates
- template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
parameters:
deployment_name: mlflowdp
deployment_file: data-science-regression/components/deploy/batch-endpoint/mlflow-deployment.yml
# - template: templates/${{ variables.version }}/test-deployment.yml@mlops-templates
# parameters:
# deployment_name: blue
# sample_request: data-science-regression/components/deploy/blue/sample-request.json

Просмотреть файл

@ -0,0 +1,101 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
variables:
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
# 'main' branch: PRD environment
- template: ../../../config-infra-prod.yml
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
# 'develop' or feature branches: DEV environment
- template: ../../../config-infra-dev.yml
- name: version
value: aml-cli-v2 #must be either 'python-sdk' or 'aml-cli-v2'
- name: endpoint_name
value: onlineendpoint1
- name: endpoint_type
value: online
trigger:
- none
pool:
vmImage: ubuntu-20.04
resources:
repositories:
- repository: mlops-templates # Template Repo
name: Azure/mlops-templates # need to change org name from "Azure" to your own org
endpoint: mlops-v2-service-connection # need to hardcode as repositories doesn't accept variables
type: github
stages:
- stage: DeployTrainingPipeline
displayName: Deploy Training Pipeline
jobs:
- job: DeployTrainingPipeline
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates
parameters:
pipeline_file: data-science-regression/pipeline.yml
condition: eq(variables['build.sourceBranchName'], 'main') # Selective skipping based on branch; remove this line before release!
- stage: CreateOnlineEndpoint
displayName: Create/Update Online Endpoint
jobs:
- job: DeployOnlineEndpoint
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/create-endpoint.yml@mlops-templates
- template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
parameters:
deployment_name: blue
deployment_file: data-science-regression/components/deploy/online-endpoint/blue/blue-deployment.yml
- template: templates/${{ variables.version }}/test-deployment.yml@mlops-templates
parameters:
deployment_name: blue
sample_request: data-science-regression/components/deploy/online-endpoint/blue/sample-request.json
- template: templates/${{ variables.version }}/allocate-traffic.yml@mlops-templates
parameters:
traffic_allocation: blue=100
# Example: Safe Rollout, can also be used for A/B testing
- stage: SafeRollout
displayName: Safe rollout of new deployment
jobs:
- job: SafeRolloutDeployment
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
parameters:
deployment_name: green
deployment_file: data-science-regression/components/deploy/online-endpoint/green/green-deployment.yml
- template: templates/${{ variables.version }}/test-deployment.yml@mlops-templates
parameters:
deployment_name: green
sample_request: data-science-regression/components/deploy/online-endpoint/green/sample-request.json
- template: templates/${{ variables.version }}/allocate-traffic.yml@mlops-templates
parameters:
traffic_allocation: blue=90 green=10

Просмотреть файл

@ -0,0 +1,50 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
variables:
- template: ../../../config-aml.yml
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
# 'main' branch: PRD environment
- template: ../../../config-infra-prod.yml
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
# 'develop' or feature branches: DEV environment
- template: ../../../config-infra-dev.yml
- name: version
value: python-sdk
trigger:
- none
pool:
vmImage: $(ap_vm_image)
resources:
repositories:
- repository: mlops-templates # Template Repo
name: Azure/mlops-templates
endpoint: mlops-v2-tabular # need to hardcode!
type: github
stages:
- stage: DeployBatchScoringPipeline
displayName: Deploy Batch Scoring Pipeline
jobs:
- job: DeployBatchScoringPipeline
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/create-environment.yml@mlops-templates
parameters:
environment_name: $(batch_env_name)
environment_conda_yaml: $(batch_env_conda_yaml)
- template: templates/${{ variables.version }}/register-dataset.yml@mlops-templates
parameters:
data_type: scoring
- template: templates/${{ variables.version }}/deploy-batch-scoring-pipeline.yml@mlops-templates
- template: templates/${{ variables.version }}/add-pipeline-to-endpoint.yml@mlops-templates
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates

Просмотреть файл

@ -0,0 +1,53 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
variables:
- template: ../../../config-aml.yml
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
# 'main' branch: PRD environment
- template: ../../../config-infra-prod.yml
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
# 'develop' or feature branches: DEV environment
- template: ../../../config-infra-dev.yml
- name: version
value: python-sdk
trigger:
- none
pool:
vmImage: $(ap_vm_image)
resources:
repositories:
- repository: mlops-templates # Template Repo
name: Azure/mlops-templates # need to change org name from Azure when pulling the template
endpoint: mlops-v2-tabular # need to hardcode!
type: github
stages:
- stage: DeployTrainingPipeline
displayName: Deploy Training Pipeline
jobs:
- job: DeployTrainingPipeline
steps:
- checkout: self
path: s/
- checkout: mlops-templates
path: s/templates/
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
- template: templates/${{ variables.version }}/create-environment.yml@mlops-templates
parameters:
environment_name: $(training_env_name)
environment_conda_yaml: $(training_env_conda_yaml)
- template: templates/${{ variables.version }}/register-dataset.yml@mlops-templates
parameters:
data_type: training
- template: templates/${{ variables.version }}/get-compute.yml@mlops-templates
parameters:
compute_type: training
- template: templates/${{ variables.version }}/deploy-training-pipeline.yml@mlops-templates
- template: templates/${{ variables.version }}/add-pipeline-to-endpoint.yml@mlops-templates
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates

Просмотреть файл

Просмотреть файл

@ -0,0 +1,20 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
name: mnist-batch
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.7.5
- pip
- pip:
- azureml-defaults==1.38.0
- azureml-mlflow==1.38.0
- azureml-sdk==1.38.0
- azureml-interpret==1.38.0
- scikit-learn==0.24.1
- pandas==1.2.1
- joblib==1.0.0
- matplotlib==3.3.3

Просмотреть файл

Просмотреть файл

@ -0,0 +1,20 @@
name: mnist-train
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.7.5
- pip
- pip:
- azureml-mlflow==1.38.0
- azureml-sdk==1.38.0
- scikit-learn==0.24.1
- pandas==1.2.1
- joblib==1.0.0
- matplotlib==3.3.3
- fairlearn==0.7.0
- azureml-contrib-fairness==1.38.0
- interpret-community==0.24.1
- interpret-core==0.2.7
- azureml-interpret==1.38.0