Merging the complete build with online and batch end points. (#15)
* Removed dummy files and added actual files for training pipeline. * Organizing artifactstore * Set up CI with Azure Pipelines * Updated the service connection name for the template to run. * Update deploy-model-training-pipeline-v2.yml for Azure Pipelines Co-authored-by: cindyweng <weng.cindy@gmail.com> Co-authored-by: Cindy Weng <8880364+cindyweng@users.noreply.github.com> Co-authored-by: murggu <amurguzur@gmail.com> Co-authored-by: Maggie Mhanna <maggiemhanna@gmail.com> Co-authored-by: Christoph Muller-Reyes <chrey@microsoft.com> Co-authored-by: chrey-gh <58181624+chrey-gh@users.noreply.github.com>
This commit is contained in:
Родитель
9ad529df73
Коммит
ab1fe15054
|
@ -130,3 +130,11 @@ dmypy.json
|
|||
|
||||
# Pyre type checker
|
||||
.pyre/
|
||||
|
||||
# Terraform
|
||||
.terraform.lock.hcl
|
||||
terraform.tfstate
|
||||
terraform.tfstate.backup
|
||||
.terraform.tfstate.lock.info
|
||||
.terraform
|
||||
terraform.tfvars
|
|
@ -1,9 +0,0 @@
|
|||
# Microsoft Open Source Code of Conduct
|
||||
|
||||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
||||
|
||||
Resources:
|
||||
|
||||
- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
|
||||
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
|
||||
- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
|
|
@ -0,0 +1,40 @@
|
|||
# Quickstart
|
||||
|
||||
|
||||
## Setting Variables
|
||||
---
|
||||
|
||||
For a quickstart, the only variables needed to be set are in 'config-infra-dev.yml':
|
||||
* If your location (Azure Regtion) is different from 'northeurope' then you'll have to adjust it to the desired one p.ex. 'westus' like here: 'location: westus'
|
||||
* the function of 'namespace' is to make all your artifacts, that you're going to deploy, unique. Since there's going to be a Storage Account deployed, it has to adhere to the naming limitations of these (3-24 characters, all lowercase letters or numbers)
|
||||
* as of (20220405) the 'ado_service_connection_rg' needs to have contributor permission subscription wide, since there's two resource groups being created: one for the Terraform state, and the second, which contains the artifacts for the Machine Learning Workspace (Storage Account, Key Vault, Application Insights, Container Registry). You then have to create a service connection in your ADO project, which has the same name ' or adjust it here accordingly.
|
||||
|
||||
|
||||
## Deploying Infrastructure via ADO (Azure DevOps)
|
||||
---
|
||||
|
||||
To daploy the infrastructure in ADO (Azure DevOps), you will have to have an organization and a project, with the mentioned service connection configured.
|
||||
Then under pipelines you'll create a new pipeline and choose 'infrastructure\terraform\pipelines\tf-ado-deploy-infra.yml' as the source.
|
||||
|
||||
You can then run the pipeline, which should create the following artifacts:
|
||||
* Resource Group for Terraform State including Storage Account
|
||||
* Resource Group for your Workspace including Storage Account, Container Registry, Application Insights, Keyvault and the Azure Machine Learning Workspace itself.
|
||||
|
||||
> If you didn't change the variable 'enable_aml_computecluster' from 'true' to 'false' a compute cluster is created as defined in 'infrastructure\terraform\modules\aml-workspace\main.tf'
|
||||
|
||||
|
||||
As of now (20220410) the Terraform infrastructure pipeline will create a new pair of Terraform state Resource Group and Machine Learning workspace Resource Group every time it runs, with a slightly different name (number 10x).
|
||||
|
||||
The successfully run pipeline should look like this:
|
||||
|
||||
![IaC image](./images/iacpipelineresult.png)
|
||||
|
||||
<p>
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
## Deploying Training Pipeline via ADO (Azure DevOps)
|
||||
---
|
||||
|
||||
|
41
SECURITY.md
41
SECURITY.md
|
@ -1,41 +0,0 @@
|
|||
<!-- BEGIN MICROSOFT SECURITY.MD V0.0.5 BLOCK -->
|
||||
|
||||
## Security
|
||||
|
||||
Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
|
||||
|
||||
If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below.
|
||||
|
||||
## Reporting Security Issues
|
||||
|
||||
**Please do not report security vulnerabilities through public GitHub issues.**
|
||||
|
||||
Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
|
||||
|
||||
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
|
||||
|
||||
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
|
||||
|
||||
Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
|
||||
|
||||
* Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
|
||||
* Full paths of source file(s) related to the manifestation of the issue
|
||||
* The location of the affected source code (tag/branch/commit or direct URL)
|
||||
* Any special configuration required to reproduce the issue
|
||||
* Step-by-step instructions to reproduce the issue
|
||||
* Proof-of-concept or exploit code (if possible)
|
||||
* Impact of the issue, including how an attacker might exploit the issue
|
||||
|
||||
This information will help us triage your report more quickly.
|
||||
|
||||
If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
|
||||
|
||||
## Preferred Languages
|
||||
|
||||
We prefer all communications to be in English.
|
||||
|
||||
## Policy
|
||||
|
||||
Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
|
||||
|
||||
<!-- END MICROSOFT SECURITY.MD BLOCK -->
|
25
SUPPORT.md
25
SUPPORT.md
|
@ -1,25 +0,0 @@
|
|||
# TODO: The maintainer of this repo has not yet edited this file
|
||||
|
||||
**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
|
||||
|
||||
- **No CSS support:** Fill out this template with information about how to file issues and get help.
|
||||
- **Yes CSS support:** Fill out an intake form at [aka.ms/spot](https://aka.ms/spot). CSS will work with/help you to determine next steps. More details also available at [aka.ms/onboardsupport](https://aka.ms/onboardsupport).
|
||||
- **Not sure?** Fill out a SPOT intake as though the answer were "Yes". CSS will help you decide.
|
||||
|
||||
*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
|
||||
|
||||
# Support
|
||||
|
||||
## How to file issues and get help
|
||||
|
||||
This project uses GitHub Issues to track bugs and feature requests. Please search the existing
|
||||
issues before filing new issues to avoid duplicates. For new issues, file your bug or
|
||||
feature request as a new Issue.
|
||||
|
||||
For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
|
||||
FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
|
||||
CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
|
||||
|
||||
## Microsoft Support Policy
|
||||
|
||||
Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
|
|
@ -0,0 +1,75 @@
|
|||
variables:
|
||||
|
||||
ap_vm_image: ubuntu-20.04
|
||||
|
||||
# Training pipeline settings
|
||||
|
||||
# Training dataset settings
|
||||
training_dataset_name: uci-credit
|
||||
training_dataset_description: uci_credit
|
||||
training_dataset_local_path: data/training/
|
||||
training_dataset_path_on_datastore: data/training/
|
||||
training_dataset_type: local
|
||||
training_dataset_storage_url: 'https://azureaidemostorage.blob.core.windows.net/data/'
|
||||
|
||||
# Training AzureML Environment name
|
||||
training_env_name: credit-training
|
||||
|
||||
# Training AzureML Environment conda yaml
|
||||
training_env_conda_yaml: mlops/environments/train.yml
|
||||
|
||||
# Name for the training pipeline
|
||||
training_pipeline_name: credit-training
|
||||
|
||||
# Compute target for pipeline
|
||||
training_target: cpu-cluster
|
||||
training_target_sku: STANDARD_D2_V2
|
||||
training_target_min_nodes: 0
|
||||
training_target_max_nodes: 4
|
||||
|
||||
# Training arguments specification; use azureml:dataset_name:version to reference an AML Dataset for --data_path
|
||||
training_arguments: --data_path azureml:uci-credit:1
|
||||
|
||||
# Name under which the model will be registered
|
||||
model_name: credit-ci
|
||||
|
||||
# Batch pipeline settings
|
||||
|
||||
# Batch scoring dataset settings
|
||||
scoring_dataset_name: credit-batch-input
|
||||
scoring_dataset_description: credit-batch-input
|
||||
scoring_dataset_local_path: data/scoring/
|
||||
scoring_dataset_path_on_datastore: data/scoring/
|
||||
scoring_dataset_type: local
|
||||
scoring_dataset_storage_url: 'https://azureaidemostorage.blob.core.windows.net/data/'
|
||||
|
||||
# Batch AzureML Environment name
|
||||
batch_env_name: credit-batch
|
||||
|
||||
# Batch AzureML Environment conda yaml
|
||||
batch_env_conda_yaml: mlops/environments/batch.yml
|
||||
|
||||
# Name for the batch scoring pipeline
|
||||
batch_pipeline_name: credit-batch-scoring
|
||||
|
||||
# Compute target for pipeline
|
||||
batch_target: cpu-cluster
|
||||
#not needed because batch uses the same target as training
|
||||
# batch_target_sku: STANDARD_D2_V2
|
||||
# batch_target_min_nodes: 0
|
||||
# batch_target_max_nodes: 4
|
||||
|
||||
# Input batch dataset
|
||||
batch_input_dataset_name: credit-batch-input
|
||||
|
||||
# Output dataset with results
|
||||
batch_output_dataset_name: credit-batch-output
|
||||
batch_output_path_on_datastore: credit-batch-scoring-results/{run-id}
|
||||
batch_output_filename: results.csv
|
||||
|
||||
# Parallelization settings
|
||||
batch_mini_batch_size: 8
|
||||
batch_error_threshold: 1
|
||||
batch_process_count_per_node: 1
|
||||
batch_node_count: 1
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
variables:
|
||||
|
||||
# Global
|
||||
namespace: mlopstab
|
||||
postfix: 441
|
||||
location: northeurope
|
||||
environment: dev
|
||||
enable_aml_computecluster: true
|
||||
|
||||
# Azure DevOps
|
||||
ado_service_connection_rg: Azure-ARM-Dev #-murggu
|
||||
ado_service_connection_aml_ws: Azure-ARM-Dev
|
||||
|
||||
# Github
|
||||
gh_service_endpoint: mlops-v2-tabular #this isn't allowed to be a variable in the devops yaml, so needs to be hardcoded in devops pipelines
|
||||
gh_org_name:
|
||||
gh_org_url:
|
||||
|
||||
# IaC
|
||||
resource_group: azureml-examples-rg #rg-$(namespace)-$(postfix)
|
||||
aml_workspace: main #mlw-$(namespace)-$(postfix)
|
||||
application_insights: mlw-$(namespace)-$(postfix)
|
||||
key_vault: kv-$(namespace)-$(postfix)
|
||||
container_registry: cr$(namespace)$(postfix)
|
||||
storage_account: st$(namespace)$(postfix)
|
||||
|
||||
# Terraform
|
||||
terraform_version: 0.14.7
|
||||
terraform_workingdir: infrastructure/terraform
|
||||
terraform_st_resource_group: rg-$(namespace)-$(postfix)-tf-state
|
||||
terraform_st_storage_account: st$(namespace)$(postfix)tfstate
|
||||
terraform_st_container_name: default
|
||||
terraform_st_key: mlops-tab
|
|
@ -0,0 +1,20 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
# Definition of production infra-related environment variables
|
||||
|
||||
variables:
|
||||
|
||||
# Prod Environment
|
||||
environment: prod
|
||||
resource_group: azureml-examples-rg #rg-mlops-template-prod-001
|
||||
location: westeurope
|
||||
namespace: mlopsprodtmpl
|
||||
aml_workspace: main #aml$(namespace)
|
||||
storage_account: sa$(namespace)
|
||||
key_vault: kv$(namespace)
|
||||
application_insights: ai$(namespace)
|
||||
container_registry: cr$(namespace)
|
||||
service_connection_rg: conn-mlops-sub-infra
|
||||
service_connection_aml_ws: conn-mlops-aml-bsc-prod-tmpl
|
||||
gh_service_connection: mlops-v2-tabular
|
|
@ -0,0 +1,193 @@
|
|||
# NYC Taxi Data Regression
|
||||
### This is an end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs. You can learn more about creating reusable components for your pipeline [here](https://github.com/Azure/azureml_run_specification/blob/master/specs/pipeline-component.md).
|
||||
* Merge Taxi Data
|
||||
* This component takes multiple taxi datasets (yellow and green) and merges/filters the data.
|
||||
* Input: Local data under samples/nyc_taxi_data_regression/data (multiple .csv files)
|
||||
* Output: Single filtered dataset (.csv)
|
||||
* Taxi Feature Engineering
|
||||
* This component creates features out of the taxi data to be used in training.
|
||||
* Input: Filtered dataset from previous step (.csv)
|
||||
* Output: Dataset with 20+ features (.csv)
|
||||
* Train Linear Regression Model
|
||||
* This component splits the dataset into train/test sets and trains an sklearn Linear Regressor with the training set.
|
||||
* Input: Data with feature set
|
||||
* Output: Trained model (pickle format) and data subset for test (.csv)
|
||||
* Predict Taxi Fares
|
||||
* This component uses the trained model to predict taxi fares on the test set.
|
||||
* Input: Linear regression model and test data from previous step
|
||||
* Output: Test data with predictions added as a column (.csv)
|
||||
* Score Model
|
||||
* This component scores the model based on how accurate the predictions are in the test set.
|
||||
* Input: Test data with predictions and model
|
||||
* Output: Report with model coefficients and evaluation scores (.txt)
|
||||
|
||||
|
||||
#### 1. Make sure you are in the `nyc_taxi_data_regression` directory for this sample.
|
||||
|
||||
|
||||
#### 2. Submit the Pipeline Job.
|
||||
|
||||
Make sure the compute cluster used in job.yml is the one that is actually available in your workspace.
|
||||
|
||||
Submit the Pipeline Job
|
||||
```
|
||||
az ml job create --file pipeline.yml
|
||||
```
|
||||
|
||||
Once you submit the job, you will find the URL to the Studio UI view the job graph and logs in the `Studio.endpoints` -> `services` section of the output.
|
||||
|
||||
|
||||
Sample output
|
||||
```
|
||||
(cliv2-dev) PS D:\azureml-examples-lochen\cli\jobs\pipelines-with-components\nyc_taxi_data_regression> az ml job create -f pipeline.yml
|
||||
Command group 'ml job' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
|
||||
Asset labels are still in preview and may resolve to an incorrect asset version.
|
||||
{
|
||||
"creation_context": {
|
||||
"created_at": "2022-03-15T11:25:38.323397+00:00",
|
||||
"created_by": "Long Chen",
|
||||
"created_by_type": "User"
|
||||
},
|
||||
"experiment_name": "nyc_taxi_data_regression",
|
||||
"id": "azureml:/subscriptions/ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9/resourceGroups/pipeline-pm/providers/Microsoft.MachineLearningServices/workspaces/pm-dev/jobs/6cef8ff4-2bd3-4101-adf2-11e0b62e6f6d",
|
||||
"inputs": {
|
||||
"pipeline_job_input": {
|
||||
"mode": "ro_mount",
|
||||
"path": "azureml:azureml://datastores/workspaceblobstore/paths/LocalUpload/aa784b6f4b0d0d3090bcd00415290f39/data",
|
||||
"type": "uri_folder"
|
||||
}
|
||||
},
|
||||
"jobs": {
|
||||
"predict-job": {
|
||||
"$schema": "{}",
|
||||
"command": "",
|
||||
"component": "azureml:49fa5eab-ad35-e3eb-27bc-5568fd2dcd74:1",
|
||||
"environment_variables": {},
|
||||
"inputs": {
|
||||
"model_input": "${{parent.jobs.train-job.outputs.model_output}}",
|
||||
"test_data": "${{parent.jobs.train-job.outputs.test_data}}"
|
||||
},
|
||||
"outputs": {
|
||||
"predictions": "${{parent.outputs.pipeline_job_predictions}}"
|
||||
},
|
||||
"type": "command"
|
||||
},
|
||||
"prep-job": {
|
||||
"$schema": "{}",
|
||||
"command": "",
|
||||
"component": "azureml:526bfb0e-aba5-36f3-ab06-2b4df9ec1554:1",
|
||||
"environment_variables": {},
|
||||
"inputs": {
|
||||
"raw_data": "${{parent.inputs.pipeline_job_input}}"
|
||||
},
|
||||
"outputs": {
|
||||
"prep_data": "${{parent.outputs.pipeline_job_prepped_data}}"
|
||||
},
|
||||
"type": "command"
|
||||
},
|
||||
"score-job": {
|
||||
"$schema": "{}",
|
||||
"command": "",
|
||||
"component": "azureml:f0ae472c-7639-1b4a-47ff-3155384584cf:1",
|
||||
"environment_variables": {},
|
||||
"inputs": {
|
||||
"model": "${{parent.jobs.train-job.outputs.model_output}}",
|
||||
"predictions": "${{parent.jobs.predict-job.outputs.predictions}}"
|
||||
},
|
||||
"outputs": {
|
||||
"score_report": "${{parent.outputs.pipeline_job_score_report}}"
|
||||
},
|
||||
"type": "command"
|
||||
},
|
||||
"train-job": {
|
||||
"$schema": "{}",
|
||||
"command": "",
|
||||
"component": "azureml:df45efbf-8373-82fd-7d5e-56fa3cd31c05:1",
|
||||
"environment_variables": {},
|
||||
"inputs": {
|
||||
"training_data": "${{parent.jobs.transform-job.outputs.transformed_data}}"
|
||||
},
|
||||
"outputs": {
|
||||
"model_output": "${{parent.outputs.pipeline_job_trained_model}}",
|
||||
"test_data": "${{parent.outputs.pipeline_job_test_data}}"
|
||||
},
|
||||
"type": "command"
|
||||
},
|
||||
"transform-job": {
|
||||
"$schema": "{}",
|
||||
"command": "",
|
||||
"component": "azureml:107ae7d3-7813-1399-34b1-17335735496c:1",
|
||||
"environment_variables": {},
|
||||
"inputs": {
|
||||
"clean_data": "${{parent.jobs.prep-job.outputs.prep_data}}"
|
||||
},
|
||||
"outputs": {
|
||||
"transformed_data": "${{parent.outputs.pipeline_job_transformed_data}}"
|
||||
},
|
||||
"type": "command"
|
||||
}
|
||||
},
|
||||
"name": "6cef8ff4-2bd3-4101-adf2-11e0b62e6f6d",
|
||||
"outputs": {
|
||||
"pipeline_job_predictions": {
|
||||
"mode": "upload",
|
||||
"type": "uri_folder"
|
||||
},
|
||||
"pipeline_job_prepped_data": {
|
||||
"mode": "upload",
|
||||
"type": "uri_folder"
|
||||
},
|
||||
"pipeline_job_score_report": {
|
||||
"mode": "upload",
|
||||
"type": "uri_folder"
|
||||
},
|
||||
"pipeline_job_test_data": {
|
||||
"mode": "upload",
|
||||
"type": "uri_folder"
|
||||
},
|
||||
"pipeline_job_trained_model": {
|
||||
"mode": "upload",
|
||||
"type": "uri_folder"
|
||||
},
|
||||
"pipeline_job_transformed_data": {
|
||||
"mode": "upload",
|
||||
"type": "uri_folder"
|
||||
}
|
||||
},
|
||||
"properties": {
|
||||
"azureml.continue_on_step_failure": "False",
|
||||
"azureml.git.dirty": "True",
|
||||
"azureml.parameters": "{}",
|
||||
"azureml.pipelineComponent": "pipelinerun",
|
||||
"azureml.runsource": "azureml.PipelineRun",
|
||||
"mlflow.source.git.branch": "march-cli-preview",
|
||||
"mlflow.source.git.commit": "8e28ab743fd680a95d71a50e456c68757669ccc7",
|
||||
"mlflow.source.git.repoURL": "https://github.com/Azure/azureml-examples.git",
|
||||
"runSource": "MFE",
|
||||
"runType": "HTTP"
|
||||
},
|
||||
"resourceGroup": "pipeline-pm",
|
||||
"services": {
|
||||
"Studio": {
|
||||
"endpoint": "https://ml.azure.com/runs/6cef8ff4-2bd3-4101-adf2-11e0b62e6f6d?wsid=/subscriptions/ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9/resourcegroups/pipeline-pm/workspaces/pm-dev&tid=72f988bf-86f1-41af-91ab-2d7cd011db47",
|
||||
"job_service_type": "Studio"
|
||||
},
|
||||
"Tracking": {
|
||||
"endpoint": "azureml://eastus.api.azureml.ms/mlflow/v1.0/subscriptions/ee85ed72-2b26-48f6-a0e8-cb5bcf98fbd9/resourceGroups/pipeline-pm/providers/Microsoft.MachineLearningServices/workspaces/pm-dev?",
|
||||
"job_service_type": "Tracking"
|
||||
}
|
||||
},
|
||||
"settings": {
|
||||
"continue_on_step_failure": false,
|
||||
"default_compute": "cpu-cluster",
|
||||
"default_datastore": "workspaceblobstore"
|
||||
},
|
||||
"status": "Preparing",
|
||||
"tags": {
|
||||
"azureml.Designer": "true"
|
||||
},
|
||||
"type": "pipeline"
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
artifact_path: model
|
||||
flavors:
|
||||
python_function:
|
||||
env: conda.yml
|
||||
loader_module: mlflow.sklearn
|
||||
model_path: model.pkl
|
||||
python_version: 3.7.7
|
||||
sklearn:
|
||||
pickled_model: model.pkl
|
||||
serialization_format: cloudpickle
|
||||
sklearn_version: 0.22.2.post1
|
||||
run_id: e07f67c7-f37b-4534-8ac9-9984deb45e2f
|
||||
saved_input_example_info:
|
||||
artifact_path: input_example.json
|
||||
pandas_orient: split
|
||||
type: dataframe
|
||||
signature:
|
||||
inputs: '[{"name": "fareAmount", "type": "float"}, {"name": "paymentType", "type":
|
||||
"integer"}, {"name": "passengerCount", "type": "integer"}, {"name": "tripDistance",
|
||||
"type": "float"}, {"name": "tripTimeSecs", "type": "integer"}, {"name": "pickupTimeBin",
|
||||
"type": "string"}]'
|
||||
outputs: '[{"type": "integer"}]'
|
||||
utc_time_created: '2020-11-05 03:39:25.470901'
|
|
@ -0,0 +1,11 @@
|
|||
channels:
|
||||
- defaults
|
||||
- conda-forge
|
||||
dependencies:
|
||||
- python=3.7.7
|
||||
- scikit-learn=0.22.2.post1
|
||||
- pip
|
||||
- pip:
|
||||
- mlflow
|
||||
- cloudpickle==1.6.0
|
||||
name: mlflow-env
|
Двоичные данные
data-science-regression/components/deploy/batch-endpoint/autolog_nyc_taxi/model.pkl
Normal file
Двоичные данные
data-science-regression/components/deploy/batch-endpoint/autolog_nyc_taxi/model.pkl
Normal file
Двоичный файл не отображается.
|
@ -0,0 +1,6 @@
|
|||
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
|
||||
name: mlflowdp
|
||||
endpoint_name: mybatchedp
|
||||
model:
|
||||
path: ./autolog_nyc_taxi
|
||||
compute: azureml:batch-cluster
|
|
@ -0,0 +1,13 @@
|
|||
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
|
||||
name: blue
|
||||
endpoint_name: my-endpoint
|
||||
model:
|
||||
path: model/sklearn_regression_model.pkl
|
||||
code_configuration:
|
||||
code: src/
|
||||
scoring_script: score.py
|
||||
environment:
|
||||
conda_file: environment/conda.yml
|
||||
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
|
||||
instance_type: Standard_F2s_v2
|
||||
instance_count: 1
|
|
@ -0,0 +1,13 @@
|
|||
name: model-env
|
||||
channels:
|
||||
- conda-forge
|
||||
dependencies:
|
||||
- python=3.7
|
||||
- numpy=1.21.2
|
||||
- pip=21.2.4
|
||||
- scikit-learn=0.24.2
|
||||
- scipy=1.7.1
|
||||
- pip:
|
||||
- azureml-defaults==1.38.0
|
||||
- inference-schema[numpy-support]==1.3.0
|
||||
- joblib==1.0.1
|
Двоичные данные
data-science-regression/components/deploy/online-endpoint/blue/model/sklearn_regression_model.pkl
Normal file
Двоичные данные
data-science-regression/components/deploy/online-endpoint/blue/model/sklearn_regression_model.pkl
Normal file
Двоичный файл не отображается.
|
@ -0,0 +1,4 @@
|
|||
{"data": [
|
||||
[1,2,3,4,5,6,7,8,9,10],
|
||||
[10,9,8,7,6,5,4,3,2,1]
|
||||
]}
|
|
@ -0,0 +1,35 @@
|
|||
import os
|
||||
import logging
|
||||
import json
|
||||
import numpy
|
||||
import joblib
|
||||
|
||||
|
||||
def init():
|
||||
"""
|
||||
This function is called when the container is initialized/started, typically after create/update of the deployment.
|
||||
You can write the logic here to perform init operations like caching the model in memory
|
||||
"""
|
||||
global model
|
||||
# AZUREML_MODEL_DIR is an environment variable created during deployment.
|
||||
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
|
||||
model_path = os.path.join(
|
||||
os.getenv("AZUREML_MODEL_DIR"), "sklearn_regression_model.pkl"
|
||||
)
|
||||
# deserialize the model file back into a sklearn model
|
||||
model = joblib.load(model_path)
|
||||
logging.info("Init complete")
|
||||
|
||||
|
||||
def run(raw_data):
|
||||
"""
|
||||
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
|
||||
In the example we extract the data from the json input and call the scikit-learn model's predict()
|
||||
method and return the result back
|
||||
"""
|
||||
logging.info("Request received")
|
||||
data = json.loads(raw_data)["data"]
|
||||
data = numpy.array(data)
|
||||
result = model.predict(data)
|
||||
logging.info("Request processed")
|
||||
return result.tolist()
|
|
@ -0,0 +1,13 @@
|
|||
name: model-env
|
||||
channels:
|
||||
- conda-forge
|
||||
dependencies:
|
||||
- python=3.7
|
||||
- numpy=1.21.2
|
||||
- pip=21.2.4
|
||||
- scikit-learn=0.24.2
|
||||
- scipy=1.7.1
|
||||
- pip:
|
||||
- azureml-defaults==1.38.0
|
||||
- inference-schema[numpy-support]==1.3.0
|
||||
- joblib==1.0.1
|
|
@ -0,0 +1,13 @@
|
|||
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
|
||||
name: green
|
||||
endpoint_name: my-endpoint
|
||||
model:
|
||||
path: model/sklearn_regression_model.pkl
|
||||
code_configuration:
|
||||
code: src/
|
||||
scoring_script: score.py
|
||||
environment:
|
||||
conda_file: environment/conda.yml
|
||||
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210727.v1
|
||||
instance_type: Standard_F2s_v2
|
||||
instance_count: 1
|
Двоичные данные
data-science-regression/components/deploy/online-endpoint/green/model/sklearn_regression_model.pkl
Normal file
Двоичные данные
data-science-regression/components/deploy/online-endpoint/green/model/sklearn_regression_model.pkl
Normal file
Двоичный файл не отображается.
|
@ -0,0 +1,4 @@
|
|||
{"data": [
|
||||
[1,2,3,4,5,6,7,8,9,10],
|
||||
[10,9,8,7,6,5,4,3,2,1]
|
||||
]}
|
|
@ -0,0 +1,36 @@
|
|||
import os
|
||||
import logging
|
||||
import json
|
||||
import numpy
|
||||
import joblib
|
||||
|
||||
|
||||
def init():
|
||||
"""
|
||||
This function is called when the container is initialized/started, typically after create/update of the deployment.
|
||||
You can write the logic here to perform init operations like caching the model in memory
|
||||
"""
|
||||
global model
|
||||
# AZUREML_MODEL_DIR is an environment variable created during deployment.
|
||||
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
|
||||
model_path = os.path.join(
|
||||
os.getenv("AZUREML_MODEL_DIR"), "sklearn_regression_model.pkl"
|
||||
)
|
||||
# deserialize the model file back into a sklearn model
|
||||
model = joblib.load(model_path)
|
||||
logging.info("Init complete")
|
||||
|
||||
|
||||
def run(raw_data):
|
||||
"""
|
||||
This function is called for every invocation of the endpoint to perform the actual scoring/prediction.
|
||||
In the example we extract the data from the json input and call the scikit-learn model's predict()
|
||||
method and return the result back
|
||||
"""
|
||||
logging.info("Request received")
|
||||
data = json.loads(raw_data)["data"]
|
||||
data = numpy.array(data)
|
||||
result = model.predict(data)
|
||||
logging.info("Request processed")
|
||||
return result.tolist()
|
||||
|
|
@ -0,0 +1,32 @@
|
|||
# <component>
|
||||
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
|
||||
name: evaluate_model
|
||||
version: 1
|
||||
display_name: evaluate-model
|
||||
type: command
|
||||
inputs:
|
||||
model_name:
|
||||
type: string
|
||||
default: "taxi-model"
|
||||
model_input:
|
||||
type: uri_folder
|
||||
test_data:
|
||||
type: uri_folder
|
||||
outputs:
|
||||
predictions:
|
||||
type: uri_folder
|
||||
score_report:
|
||||
type: uri_folder
|
||||
deploy_flag:
|
||||
type: uri_folder
|
||||
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
|
||||
code: ./src
|
||||
command: >-
|
||||
python evaluate.py
|
||||
--model_name ${{inputs.model_name}}
|
||||
--model_input ${{inputs.model_input}}
|
||||
--test_data ${{inputs.test_data}}
|
||||
--predictions ${{outputs.predictions}}
|
||||
--score_report ${{outputs.score_report}}
|
||||
--deploy_flag ${{outputs.deploy_flag}}
|
||||
# </component>
|
|
@ -0,0 +1,142 @@
|
|||
import argparse
|
||||
import pandas as pd
|
||||
import os
|
||||
from pathlib import Path
|
||||
from sklearn.linear_model import LinearRegression
|
||||
import pickle
|
||||
from sklearn.metrics import mean_squared_error, r2_score
|
||||
from azureml.core import Run, Experiment, Model
|
||||
|
||||
# current run
|
||||
run = Run.get_context()
|
||||
ws = run.experiment.workspace
|
||||
|
||||
parser = argparse.ArgumentParser("predict")
|
||||
parser.add_argument("--model_name", type=str, help="Name of registered model")
|
||||
parser.add_argument("--model_input", type=str, help="Path of input model")
|
||||
parser.add_argument("--test_data", type=str, help="Path to test data")
|
||||
parser.add_argument("--predictions", type=str, help="Path of predictions")
|
||||
parser.add_argument("--score_report", type=str, help="Path to score report")
|
||||
parser.add_argument('--deploy_flag', type=str, help='A deploy flag whether to deploy or no')
|
||||
|
||||
# ---------------- Model Evaluation ---------------- #
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
lines = [
|
||||
f"Model path: {args.model_input}",
|
||||
f"Test data path: {args.test_data}",
|
||||
f"Predictions path: {args.predictions}",
|
||||
f"Scoring output path: {args.score_report}",
|
||||
]
|
||||
|
||||
for line in lines:
|
||||
print(line)
|
||||
|
||||
# Load the test data
|
||||
|
||||
print("mounted_path files: ")
|
||||
arr = os.listdir(args.test_data)
|
||||
|
||||
print(arr)
|
||||
|
||||
test_data = pd.read_csv((Path(args.test_data) / "test.csv"))
|
||||
print(test_data.columns)
|
||||
|
||||
testy = test_data["cost"]
|
||||
# testX = test_data.drop(['cost'], axis=1)
|
||||
testX = test_data[
|
||||
[
|
||||
"distance",
|
||||
"dropoff_latitude",
|
||||
"dropoff_longitude",
|
||||
"passengers",
|
||||
"pickup_latitude",
|
||||
"pickup_longitude",
|
||||
"store_forward",
|
||||
"vendor",
|
||||
"pickup_weekday",
|
||||
"pickup_month",
|
||||
"pickup_monthday",
|
||||
"pickup_hour",
|
||||
"pickup_minute",
|
||||
"pickup_second",
|
||||
"dropoff_weekday",
|
||||
"dropoff_month",
|
||||
"dropoff_monthday",
|
||||
"dropoff_hour",
|
||||
"dropoff_minute",
|
||||
"dropoff_second",
|
||||
]
|
||||
]
|
||||
print(testX.shape)
|
||||
print(testX.columns)
|
||||
|
||||
# Load the model from input port
|
||||
model = pickle.load(open((Path(args.model_input) / "model.sav"), "rb"))
|
||||
# model = (Path(args.model_input) / 'model.txt').read_text()
|
||||
# print('Model: ', model)
|
||||
|
||||
# Compare predictions to actuals (testy)
|
||||
output_data = testX.copy()
|
||||
output_data["actual_cost"] = testy
|
||||
output_data["predicted_cost"] = model.predict(testX)
|
||||
|
||||
# Save the output data with feature columns, predicted cost, and actual cost in csv file
|
||||
output_data.to_csv((Path(args.predictions) / "predictions.csv"))
|
||||
|
||||
# Print the results of scoring the predictions against actual values in the test data
|
||||
# The coefficients
|
||||
print("Coefficients: \n", model.coef_)
|
||||
|
||||
actuals = output_data["actual_cost"]
|
||||
predictions = output_data["predicted_cost"]
|
||||
|
||||
# The mean squared error
|
||||
print("Mean squared error: %.2f" % mean_squared_error(actuals, predictions))
|
||||
# The coefficient of determination: 1 is perfect prediction
|
||||
print("Coefficient of determination: %.2f" % r2_score(actuals, predictions))
|
||||
print("Model: ", model)
|
||||
|
||||
# Print score report to a text file
|
||||
(Path(args.score_report) / "score.txt").write_text(
|
||||
"Scored with the following model:\n{}".format(model)
|
||||
)
|
||||
with open((Path(args.score_report) / "score.txt"), "a") as f:
|
||||
f.write("\n Coefficients: \n %s \n" % str(model.coef_))
|
||||
f.write("Mean squared error: %.2f \n" % mean_squared_error(actuals, predictions))
|
||||
f.write("Coefficient of determination: %.2f \n" % r2_score(actuals, predictions))
|
||||
|
||||
# -------------------- Promotion ------------------- #
|
||||
test_scores = {}
|
||||
test_predictions = {}
|
||||
test_score = r2_score(actuals, predictions) # current model
|
||||
for model_run in Model.list(ws):
|
||||
if model_run.name == args.model_name:
|
||||
model_path = Model.download(model_run, exist_ok=True)
|
||||
mdl = pickle.load(open((Path(model_path)), "rb"))
|
||||
test_predictions[model_run.id] = mdl.predict(testX)
|
||||
test_scores[model_run.id] = r2_score(actuals, test_predictions[model_run.id])
|
||||
|
||||
print(test_scores)
|
||||
if test_scores:
|
||||
if test_score >= max(list(test_scores.values())):
|
||||
deploy_flag = 1
|
||||
else:
|
||||
deploy_flag = 0
|
||||
else:
|
||||
deploy_flag = 1
|
||||
|
||||
with open((Path(args.deploy_flag) / "deploy_flag"), 'w') as f:
|
||||
f.write('%d' % int(deploy_flag))
|
||||
|
||||
run.log('deploy flag', bool(deploy_flag))
|
||||
run.parent.log('deploy flag', bool(deploy_flag))
|
||||
|
||||
test_scores["current model"] = test_score
|
||||
model_runs_metrics_plot = pd.DataFrame(test_scores, index=["r2 score"]).plot(kind='bar', figsize=(15, 10))
|
||||
model_runs_metrics_plot.figure.savefig("model_runs_metrics_plot.png")
|
||||
model_runs_metrics_plot.figure.savefig(Path(args.score_report) / "model_runs_metrics_plot.png")
|
||||
run.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
|
||||
run.parent.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
# <component>
|
||||
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
|
||||
name: prep_data
|
||||
display_name: prep-data
|
||||
version: 1
|
||||
type: command
|
||||
inputs:
|
||||
raw_data:
|
||||
type: uri_folder
|
||||
outputs:
|
||||
transformed_data:
|
||||
type: uri_folder
|
||||
code: ./src
|
||||
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
|
||||
command: >-
|
||||
python prep.py
|
||||
--raw_data ${{inputs.raw_data}}
|
||||
--transformed_data ${{outputs.transformed_data}}
|
||||
# </component>
|
|
@ -0,0 +1,255 @@
|
|||
import argparse
|
||||
from pathlib import Path
|
||||
from uuid import uuid4
|
||||
from datetime import datetime
|
||||
import os
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from azureml.core import Run, Model
|
||||
run = Run.get_context()
|
||||
ws = run.experiment.workspace
|
||||
|
||||
parser = argparse.ArgumentParser("prep")
|
||||
parser.add_argument("--raw_data", type=str, help="Path to raw data")
|
||||
parser.add_argument("--transformed_data", type=str, help="Path of prepped data")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("hello training world...")
|
||||
|
||||
lines = [
|
||||
f"Raw data path: {args.raw_data}",
|
||||
f"Data output path: {args.transformed_data}",
|
||||
]
|
||||
|
||||
for line in lines:
|
||||
print(line)
|
||||
|
||||
# ------------ Reading Data ------------ #
|
||||
# -------------------------------------- #
|
||||
|
||||
print("mounted_path files: ")
|
||||
arr = os.listdir(args.raw_data)
|
||||
print(arr)
|
||||
|
||||
df_list = []
|
||||
for filename in arr:
|
||||
print("reading file: %s ..." % filename)
|
||||
with open(os.path.join(args.raw_data, filename), "r") as handle:
|
||||
# print (handle.read())
|
||||
# ('input_df_%s' % filename) = pd.read_csv((Path(args.training_data) / filename))
|
||||
input_df = pd.read_csv((Path(args.raw_data) / filename))
|
||||
df_list.append(input_df)
|
||||
|
||||
|
||||
# Prep the green and yellow taxi data
|
||||
green_data = df_list[0]
|
||||
yellow_data = df_list[1]
|
||||
|
||||
# ------------ Cleanse Data ------------ #
|
||||
# -------------------------------------- #
|
||||
|
||||
# Define useful columns needed
|
||||
|
||||
useful_columns = str(
|
||||
[
|
||||
"cost",
|
||||
"distance",
|
||||
"dropoff_datetime",
|
||||
"dropoff_latitude",
|
||||
"dropoff_longitude",
|
||||
"passengers",
|
||||
"pickup_datetime",
|
||||
"pickup_latitude",
|
||||
"pickup_longitude",
|
||||
"store_forward",
|
||||
"vendor",
|
||||
]
|
||||
).replace(",", ";")
|
||||
print(useful_columns)
|
||||
|
||||
# Rename green taxi columns
|
||||
green_columns = str(
|
||||
{
|
||||
"vendorID": "vendor",
|
||||
"lpepPickupDatetime": "pickup_datetime",
|
||||
"lpepDropoffDatetime": "dropoff_datetime",
|
||||
"storeAndFwdFlag": "store_forward",
|
||||
"pickupLongitude": "pickup_longitude",
|
||||
"pickupLatitude": "pickup_latitude",
|
||||
"dropoffLongitude": "dropoff_longitude",
|
||||
"dropoffLatitude": "dropoff_latitude",
|
||||
"passengerCount": "passengers",
|
||||
"fareAmount": "cost",
|
||||
"tripDistance": "distance",
|
||||
}
|
||||
).replace(",", ";")
|
||||
|
||||
# Rename yellow taxi columns
|
||||
yellow_columns = str(
|
||||
{
|
||||
"vendorID": "vendor",
|
||||
"tpepPickupDateTime": "pickup_datetime",
|
||||
"tpepDropoffDateTime": "dropoff_datetime",
|
||||
"storeAndFwdFlag": "store_forward",
|
||||
"startLon": "pickup_longitude",
|
||||
"startLat": "pickup_latitude",
|
||||
"endLon": "dropoff_longitude",
|
||||
"endLat": "dropoff_latitude",
|
||||
"passengerCount": "passengers",
|
||||
"fareAmount": "cost",
|
||||
"tripDistance": "distance",
|
||||
}
|
||||
).replace(",", ";")
|
||||
|
||||
print("green_columns: " + green_columns)
|
||||
print("yellow_columns: " + yellow_columns)
|
||||
|
||||
# Remove null data
|
||||
|
||||
def get_dict(dict_str):
|
||||
pairs = dict_str.strip("{}").split(";")
|
||||
new_dict = {}
|
||||
for pair in pairs:
|
||||
print(pair)
|
||||
key, value = pair.strip().split(":")
|
||||
new_dict[key.strip().strip("'")] = value.strip().strip("'")
|
||||
return new_dict
|
||||
|
||||
|
||||
def cleanseData(data, columns, useful_columns):
|
||||
useful_columns = [
|
||||
s.strip().strip("'") for s in useful_columns.strip("[]").split(";")
|
||||
]
|
||||
new_columns = get_dict(columns)
|
||||
|
||||
new_df = (data.dropna(how="all").rename(columns=new_columns))[useful_columns]
|
||||
|
||||
new_df.reset_index(inplace=True, drop=True)
|
||||
return new_df
|
||||
|
||||
|
||||
green_data_clean = cleanseData(green_data, green_columns, useful_columns)
|
||||
yellow_data_clean = cleanseData(yellow_data, yellow_columns, useful_columns)
|
||||
|
||||
# Append yellow data to green data
|
||||
combined_df = green_data_clean.append(yellow_data_clean, ignore_index=True)
|
||||
combined_df.reset_index(inplace=True, drop=True)
|
||||
|
||||
output_green = green_data_clean.to_csv((Path(args.transformed_data) / "green_prep_data.csv"))
|
||||
output_yellow = yellow_data_clean.to_csv((Path(args.transformed_data) / "yellow_prep_data.csv"))
|
||||
merged_data = combined_df.to_csv((Path(args.transformed_data) / "merged_data.csv"))
|
||||
|
||||
# ------------ Filter Data ------------ #
|
||||
# ------------------------------------- #
|
||||
|
||||
# Filter out coordinates for locations that are outside the city border.
|
||||
combined_df = combined_df.astype(
|
||||
{
|
||||
"pickup_longitude": "float64",
|
||||
"pickup_latitude": "float64",
|
||||
"dropoff_longitude": "float64",
|
||||
"dropoff_latitude": "float64",
|
||||
}
|
||||
)
|
||||
|
||||
latlong_filtered_df = combined_df[
|
||||
(combined_df.pickup_longitude <= -73.72)
|
||||
& (combined_df.pickup_longitude >= -74.09)
|
||||
& (combined_df.pickup_latitude <= 40.88)
|
||||
& (combined_df.pickup_latitude >= 40.53)
|
||||
& (combined_df.dropoff_longitude <= -73.72)
|
||||
& (combined_df.dropoff_longitude >= -74.72)
|
||||
& (combined_df.dropoff_latitude <= 40.88)
|
||||
& (combined_df.dropoff_latitude >= 40.53)
|
||||
]
|
||||
|
||||
latlong_filtered_df.reset_index(inplace=True, drop=True)
|
||||
|
||||
# These functions replace undefined values and rename to use meaningful names.
|
||||
replaced_stfor_vals_df = latlong_filtered_df.replace(
|
||||
{"store_forward": "0"}, {"store_forward": "N"}
|
||||
).fillna({"store_forward": "N"})
|
||||
|
||||
replaced_distance_vals_df = replaced_stfor_vals_df.replace(
|
||||
{"distance": ".00"}, {"distance": 0}
|
||||
).fillna({"distance": 0})
|
||||
|
||||
normalized_df = replaced_distance_vals_df.astype({"distance": "float64"})
|
||||
|
||||
# Split the pickup and dropoff date further into the day of the week, day of the month, and month values.
|
||||
|
||||
temp = pd.DatetimeIndex(normalized_df["pickup_datetime"], dtype="datetime64[ns]")
|
||||
normalized_df["pickup_date"] = temp.date
|
||||
normalized_df["pickup_weekday"] = temp.dayofweek
|
||||
normalized_df["pickup_month"] = temp.month
|
||||
normalized_df["pickup_monthday"] = temp.day
|
||||
normalized_df["pickup_time"] = temp.time
|
||||
normalized_df["pickup_hour"] = temp.hour
|
||||
normalized_df["pickup_minute"] = temp.minute
|
||||
normalized_df["pickup_second"] = temp.second
|
||||
|
||||
temp = pd.DatetimeIndex(normalized_df["dropoff_datetime"], dtype="datetime64[ns]")
|
||||
normalized_df["dropoff_date"] = temp.date
|
||||
normalized_df["dropoff_weekday"] = temp.dayofweek
|
||||
normalized_df["dropoff_month"] = temp.month
|
||||
normalized_df["dropoff_monthday"] = temp.day
|
||||
normalized_df["dropoff_time"] = temp.time
|
||||
normalized_df["dropoff_hour"] = temp.hour
|
||||
normalized_df["dropoff_minute"] = temp.minute
|
||||
normalized_df["dropoff_second"] = temp.second
|
||||
|
||||
del normalized_df["pickup_datetime"]
|
||||
del normalized_df["dropoff_datetime"]
|
||||
|
||||
normalized_df.reset_index(inplace=True, drop=True)
|
||||
|
||||
print(normalized_df.head)
|
||||
print(normalized_df.dtypes)
|
||||
|
||||
# Drop the pickup_date, dropoff_date, pickup_time, dropoff_time columns because they're
|
||||
# no longer needed (granular time features like hour,
|
||||
# minute and second are more useful for model training).
|
||||
del normalized_df["pickup_date"]
|
||||
del normalized_df["dropoff_date"]
|
||||
del normalized_df["pickup_time"]
|
||||
del normalized_df["dropoff_time"]
|
||||
|
||||
# Change the store_forward column to binary values
|
||||
normalized_df["store_forward"] = np.where((normalized_df.store_forward == "N"), 0, 1)
|
||||
|
||||
# Before you package the dataset, run two final filters on the dataset.
|
||||
# To eliminate incorrectly captured data points,
|
||||
# filter the dataset on records where both the cost and distance variable values are greater than zero.
|
||||
final_df = normalized_df[(normalized_df.distance > 0) & (normalized_df.cost > 0)]
|
||||
final_df.reset_index(inplace=True, drop=True)
|
||||
print(final_df.head)
|
||||
|
||||
# Output data
|
||||
transformed_data = final_df.to_csv((Path(args.transformed_data) / "transformed_data.csv"))
|
||||
|
||||
# Split data into train, val and test datasets
|
||||
|
||||
random_data = np.random.rand(len(final_df))
|
||||
|
||||
msk_train = random_data < 0.7
|
||||
msk_val = (random_data >= 0.7) & (random_data < 0.85)
|
||||
msk_test = random_data >= 0.85
|
||||
|
||||
train = final_df[msk_train]
|
||||
val = final_df[msk_val]
|
||||
test = final_df[msk_test]
|
||||
|
||||
run.log('train size', train.shape[0])
|
||||
run.log('val size', val.shape[0])
|
||||
run.log('test size', test.shape[0])
|
||||
|
||||
run.parent.log('train size', train.shape[0])
|
||||
run.parent.log('val size', val.shape[0])
|
||||
run.parent.log('test size', test.shape[0])
|
||||
|
||||
train_data = train.to_csv((Path(args.transformed_data) / "train.csv"))
|
||||
val_data = val.to_csv((Path(args.transformed_data) / "val.csv"))
|
||||
test_data = test.to_csv((Path(args.transformed_data) / "test.csv"))
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
# <component>
|
||||
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
|
||||
name: register_model
|
||||
version: 1
|
||||
display_name: register-model
|
||||
type: command
|
||||
inputs:
|
||||
model_name:
|
||||
type: string
|
||||
default: "taxi-model"
|
||||
model_path:
|
||||
type: uri_folder
|
||||
deploy_flag:
|
||||
type: uri_folder
|
||||
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
|
||||
code: ./src
|
||||
command: >-
|
||||
python register.py
|
||||
--model_name ${{inputs.model_name}}
|
||||
--model_path ${{inputs.model_path}}
|
||||
--deploy_flag ${{inputs.deploy_flag}}
|
||||
# </component>
|
|
@ -0,0 +1,33 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import os
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from azureml.core import Run, Experiment, Model
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--model_name', type=str, help='Name under which model will be registered')
|
||||
parser.add_argument('--model_path', type=str, help='Model directory')
|
||||
parser.add_argument('--deploy_flag', type=str, help='A deploy flag whether to deploy or no')
|
||||
args, _ = parser.parse_known_args()
|
||||
|
||||
print(f'Arguments: {args}')
|
||||
model_name = args.model_name
|
||||
model_path = args.model_path
|
||||
|
||||
with open((Path(args.deploy_flag) / "deploy_flag"), 'r') as f:
|
||||
deploy_flag = int(f.read())
|
||||
|
||||
# current run is the registration step
|
||||
run = Run.get_context()
|
||||
ws = run.experiment.workspace
|
||||
|
||||
if deploy_flag==1:
|
||||
print("Registering ", args.model_name)
|
||||
registered_model = Model.register(model_path=args.model_path,
|
||||
model_name=args.model_name,
|
||||
workspace=ws)
|
||||
print("Registered ", registered_model.id)
|
||||
else:
|
||||
print("Model will not be registered!")
|
|
@ -0,0 +1,75 @@
|
|||
import argparse
|
||||
from pathlib import Path
|
||||
from uuid import uuid4
|
||||
from datetime import datetime
|
||||
import os
|
||||
import pandas as pd
|
||||
from sklearn.linear_model import LinearRegression
|
||||
from sklearn.model_selection import train_test_split
|
||||
import pickle
|
||||
import mlflow
|
||||
import mlflow.sklearn
|
||||
|
||||
parser = argparse.ArgumentParser("train")
|
||||
parser.add_argument("--training_data", type=str, help="Path to training data")
|
||||
parser.add_argument("--model_output", type=str, help="Path of output model")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Enable auto logging
|
||||
mlflow.sklearn.autolog()
|
||||
|
||||
lines = [
|
||||
f"Training data path: {args.training_data}",
|
||||
f"Model output path: {args.model_output}",
|
||||
]
|
||||
|
||||
for line in lines:
|
||||
print(line)
|
||||
|
||||
print("mounted_path files: ")
|
||||
arr = os.listdir(args.training_data)
|
||||
print(arr)
|
||||
|
||||
train_data = pd.read_csv((Path(args.training_data) / "train.csv"))
|
||||
print(train_data.columns)
|
||||
|
||||
# Split the data into input(X) and output(y)
|
||||
trainy = train_data["cost"]
|
||||
# X = train_data.drop(['cost'], axis=1)
|
||||
trainX = train_data[
|
||||
[
|
||||
"distance",
|
||||
"dropoff_latitude",
|
||||
"dropoff_longitude",
|
||||
"passengers",
|
||||
"pickup_latitude",
|
||||
"pickup_longitude",
|
||||
"store_forward",
|
||||
"vendor",
|
||||
"pickup_weekday",
|
||||
"pickup_month",
|
||||
"pickup_monthday",
|
||||
"pickup_hour",
|
||||
"pickup_minute",
|
||||
"pickup_second",
|
||||
"dropoff_weekday",
|
||||
"dropoff_month",
|
||||
"dropoff_monthday",
|
||||
"dropoff_hour",
|
||||
"dropoff_minute",
|
||||
"dropoff_second",
|
||||
]
|
||||
]
|
||||
|
||||
print(trainX.shape)
|
||||
print(trainX.columns)
|
||||
|
||||
# Train a Linear Regression Model with the train set
|
||||
model = LinearRegression().fit(trainX, trainy)
|
||||
perf = model.score(trainX, trainy)
|
||||
print(perf)
|
||||
|
||||
|
||||
# Output the model and test data
|
||||
pickle.dump(model, open((Path(args.model_output) / "model.sav"), "wb"))
|
|
@ -0,0 +1,19 @@
|
|||
# <component>
|
||||
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
|
||||
name: train_model
|
||||
display_name: train-model
|
||||
version: 1
|
||||
type: command
|
||||
inputs:
|
||||
training_data:
|
||||
type: uri_folder
|
||||
outputs:
|
||||
model_output:
|
||||
type: uri_folder
|
||||
code: ./src
|
||||
environment: azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest
|
||||
command: >-
|
||||
python train.py
|
||||
--training_data ${{inputs.training_data}}
|
||||
--model_output ${{outputs.model_output}}
|
||||
# </component>
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,7 @@
|
|||
# <data>
|
||||
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
|
||||
name: greendata
|
||||
version: 3
|
||||
description: sample green taxi dataset
|
||||
path: ./data
|
||||
# </data>
|
|
@ -0,0 +1,66 @@
|
|||
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
|
||||
type: pipeline
|
||||
|
||||
# <inputs_and_outputs>
|
||||
inputs:
|
||||
pipeline_job_input: #using local data, will crate an anonymous data asset
|
||||
type: uri_folder
|
||||
path: ./data/
|
||||
|
||||
outputs:
|
||||
pipeline_job_transformed_data:
|
||||
mode: rw_mount
|
||||
pipeline_job_trained_model:
|
||||
mode: rw_mount
|
||||
pipeline_job_predictions:
|
||||
mode: rw_mount
|
||||
pipeline_job_score_report:
|
||||
mode: rw_mount
|
||||
pipeline_job_deploy_flag:
|
||||
type: uri_folder
|
||||
# </inputs_and_outputs>
|
||||
|
||||
# <jobs>
|
||||
settings:
|
||||
default_datastore: azureml:workspaceblobstore
|
||||
default_compute: azureml:cpu-cluster
|
||||
continue_on_step_failure: false
|
||||
|
||||
jobs:
|
||||
prep-job:
|
||||
type: command
|
||||
component: file:./components/prep/prep.yml
|
||||
inputs:
|
||||
raw_data: ${{parent.inputs.pipeline_job_input}}
|
||||
outputs:
|
||||
transformed_data: ${{parent.outputs.pipeline_job_transformed_data}}
|
||||
|
||||
train-job:
|
||||
type: command
|
||||
component: file:./components/train/train.yml
|
||||
inputs:
|
||||
training_data: ${{parent.jobs.prep-job.outputs.transformed_data}}
|
||||
outputs:
|
||||
model_output: ${{parent.outputs.pipeline_job_trained_model}}
|
||||
|
||||
evaluate-job:
|
||||
type: command
|
||||
component: file:./components/evaluate/evaluate.yml
|
||||
inputs:
|
||||
model_name: "taxi-model"
|
||||
model_input: ${{parent.jobs.train-job.outputs.model_output}}
|
||||
test_data: ${{parent.jobs.prep-job.outputs.transformed_data}}
|
||||
outputs:
|
||||
predictions: ${{parent.outputs.pipeline_job_predictions}}
|
||||
score_report: ${{parent.outputs.pipeline_job_score_report}}
|
||||
deploy_flag: ${{parent.outputs.pipeline_job_deploy_flag}}
|
||||
|
||||
register-job:
|
||||
type: command
|
||||
component: file:./components/register/register.yml
|
||||
inputs:
|
||||
model_name: "taxi-model"
|
||||
model_path: ${{parent.jobs.train-job.outputs.model_output}}
|
||||
deploy_flag: ${{parent.jobs.evaluate-job.outputs.deploy_flag}}
|
||||
|
||||
# </jobs>
|
|
@ -0,0 +1,7 @@
|
|||
# <data>
|
||||
$schema: https://azuremlschemas.azureedge.net/latest/dataset.schema.json
|
||||
name: yellowdata
|
||||
version: 3
|
||||
description: sample yellow taxi dataset
|
||||
path: ./data
|
||||
# </data>
|
|
@ -0,0 +1,201 @@
|
|||
import os
|
||||
import sys
|
||||
import argparse
|
||||
import joblib
|
||||
import pandas as pd
|
||||
|
||||
import mlflow
|
||||
import mlflow.sklearn
|
||||
|
||||
from sklearn.compose import ColumnTransformer
|
||||
from sklearn.impute import SimpleImputer
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.pipeline import Pipeline
|
||||
from sklearn.preprocessing import LabelEncoder
|
||||
from sklearn.preprocessing import OneHotEncoder
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from sklearn import metrics
|
||||
|
||||
from fairlearn.metrics._group_metric_set import _create_group_metric_set
|
||||
from azureml.contrib.fairness import upload_dashboard_dictionary, download_dashboard_by_upload_id
|
||||
|
||||
from interpret_community import TabularExplainer
|
||||
from azureml.interpret import ExplanationClient
|
||||
|
||||
from azureml.core import Run, Model
|
||||
run = Run.get_context()
|
||||
ws = run.experiment.workspace
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="UCI Credit example")
|
||||
parser.add_argument("--transformed_data_path", type=str, default='transformed_data/', help="Directory path to training data")
|
||||
parser.add_argument('--model_name', type=str, help='Name under which model is registered')
|
||||
parser.add_argument("--model_path", type=str, default='trained_model/', help="Model output directory")
|
||||
parser.add_argument("--explainer_path", type=str, default='trained_model/', help="Model output directory")
|
||||
parser.add_argument("--evaluation_path", type=str, default='evaluation_results/', help="Evaluation results output directory")
|
||||
parser.add_argument('--deploy_flag', type=str, help='A deploy flag whether to deploy or no')
|
||||
return parser.parse_args()
|
||||
|
||||
def main():
|
||||
# Parse command-line arguments
|
||||
args = parse_args()
|
||||
transformed_data_path = os.path.join(args.transformed_data_path, run.parent.id)
|
||||
model_path = os.path.join(args.model_path, run.parent.id)
|
||||
explainer_path = os.path.join(args.explainer_path, run.parent.id)
|
||||
evaluation_path = os.path.join(args.evaluation_path, run.parent.id)
|
||||
|
||||
# Make sure evaluation output path exists
|
||||
if not os.path.exists(evaluation_path):
|
||||
os.makedirs(evaluation_path)
|
||||
|
||||
# Make sure explainer output path exists
|
||||
if not os.path.exists(explainer_path):
|
||||
os.makedirs(explainer_path)
|
||||
|
||||
# Enable auto logging
|
||||
mlflow.sklearn.autolog()
|
||||
|
||||
# Read training & testing data
|
||||
print(os.path.join(transformed_data_path, 'train.csv'))
|
||||
train = pd.read_csv(os.path.join(transformed_data_path, 'train.csv'))
|
||||
train.drop("Sno", axis=1, inplace=True)
|
||||
y_train = train['Risk']
|
||||
X_train = train.drop('Risk', axis=1)
|
||||
|
||||
test = pd.read_csv(os.path.join(transformed_data_path, 'test.csv'))
|
||||
test.drop("Sno", axis=1, inplace=True)
|
||||
y_test = test['Risk']
|
||||
X_test = test.drop('Risk', axis=1)
|
||||
|
||||
run.log('TEST SIZE', test.shape[0])
|
||||
|
||||
# Load model
|
||||
model = joblib.load(os.path.join(model_path, 'model.pkl'))
|
||||
|
||||
# ---------------- Model Evaluation ---------------- #
|
||||
# Evaluate model using testing set
|
||||
|
||||
# Capture Accuracy Score
|
||||
test_acc = model.score(X_test, y_test)
|
||||
|
||||
# Capture ML Metrics
|
||||
test_metrics = {
|
||||
"Test Accuracy": metrics.accuracy_score(y_test, model.predict(X_test)),
|
||||
"Test Recall": metrics.recall_score(y_test, model.predict(X_test), pos_label="good"),
|
||||
"Test Precison": metrics.precision_score(y_test, model.predict(X_test), pos_label="good"),
|
||||
"Test F1 Score": metrics.f1_score(y_test, model.predict(X_test), pos_label="good")
|
||||
}
|
||||
|
||||
# Capture Confusion Matrix
|
||||
test_cm = metrics.plot_confusion_matrix(model, X_test, y_test)
|
||||
|
||||
# Save and test eval metrics
|
||||
print("Testing accuracy: %.3f" % test_acc)
|
||||
run.log('Testing accuracy', test_acc)
|
||||
run.parent.log('Testing accuracy', test_acc)
|
||||
with open(os.path.join(evaluation_path, "metrics.json"), 'w+') as f:
|
||||
json.dump(test_metrics, f)
|
||||
test_cm.figure_.savefig(os.path.join(evaluation_path, "confusion_matrix.jpg"))
|
||||
test_cm.figure_.savefig("confusion_matrix.jpg")
|
||||
run.log_image(name='Confusion Matrix Test Dataset', path="confusion_matrix.jpg")
|
||||
run.parent.log_image(name='Confusion Matrix Test Dataset', path="confusion_matrix.jpg")
|
||||
|
||||
|
||||
# -------------------- Promotion ------------------- #
|
||||
test_accuracies = {}
|
||||
test_predictions = {}
|
||||
labels_dict = {"good": int(1), "bad": int(0)}
|
||||
y_test_labels = [labels_dict[x] for x in y_test]
|
||||
|
||||
for model_run in Model.list(ws):
|
||||
if model_run.name == args.model_name:
|
||||
mdl_path = Model.download(model_run, exist_ok=True)
|
||||
mdl = joblib.load(os.path.join(mdl_path, 'model.pkl'))
|
||||
test_accuracies[model_run.id] = mdl.score(X_test, y_test)
|
||||
test_predictions[model_run.id] = [labels_dict[x] for x in mdl.predict(X_test)]
|
||||
|
||||
if test_accuracies:
|
||||
if test_acc >= max(list(test_accuracies.values())):
|
||||
deploy_flag = 1
|
||||
else:
|
||||
deploy_flag = 0
|
||||
else:
|
||||
deploy_flag = 1
|
||||
|
||||
with open(args.deploy_flag, 'w') as f:
|
||||
f.write('%d' % int(deploy_flag))
|
||||
|
||||
run.log('deploy flag', bool(deploy_flag))
|
||||
run.parent.log('deploy flag', bool(deploy_flag))
|
||||
|
||||
test_accuracies["current model"] = test_acc
|
||||
model_runs_metrics_plot = pd.DataFrame(test_accuracies, index=["accuracy"]).plot(kind='bar', figsize=(15, 10))
|
||||
model_runs_metrics_plot.figure.savefig(os.path.join(evaluation_path, "model_runs_metrics_plot.png"))
|
||||
model_runs_metrics_plot.figure.savefig("model_runs_metrics_plot.png")
|
||||
run.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
|
||||
run.parent.log_image(name='MODEL RUNS METRICS COMPARISON', path="model_runs_metrics_plot.png")
|
||||
|
||||
# -------------------- FAIRNESS ------------------- #
|
||||
# Calculate Fairness Metrics over Sensitive Features
|
||||
# Create a dictionary of model(s) you want to assess for fairness
|
||||
|
||||
sensitive_features = ["Sex"]
|
||||
sf = { col: X_test[[col]] for col in sensitive_features }
|
||||
test_predictions["currrent model"] = [labels_dict[x] for x in model.predict(X_test)]
|
||||
|
||||
dash_dict_all = _create_group_metric_set(y_true=y_test_labels,
|
||||
predictions=test_predictions,
|
||||
sensitive_features=sf,
|
||||
prediction_type='binary_classification',
|
||||
)
|
||||
|
||||
# Upload the dashboard to Azure Machine Learning
|
||||
dashboard_title = "Fairness insights Comparison of Models"
|
||||
# Set validate_model_ids parameter of upload_dashboard_dictionary to False if you have not registered your model(s)
|
||||
upload_id = upload_dashboard_dictionary(run,
|
||||
dash_dict_all,
|
||||
dashboard_name=dashboard_title,
|
||||
validate_model_ids=False)
|
||||
print("\nUploaded to id: {0}\n".format(upload_id))
|
||||
|
||||
upload_id_pipeline = upload_dashboard_dictionary(run.parent,
|
||||
dash_dict_all,
|
||||
dashboard_name=dashboard_title,
|
||||
validate_model_ids=False)
|
||||
print("\nUploaded to id: {0}\n".format(upload_id_pipeline))
|
||||
|
||||
|
||||
# -------------------- Explainability ------------------- #
|
||||
tabular_explainer = TabularExplainer(model.steps[-1][1],
|
||||
initialization_examples=X_train,
|
||||
features=X_train.columns,
|
||||
classes=[0, 1],
|
||||
transformations=model.steps[0][1])
|
||||
|
||||
joblib.dump(tabular_explainer, os.path.join(explainer_path, "explainer"))
|
||||
|
||||
# you can use the training data or the test data here, but test data would allow you to use Explanation Exploration
|
||||
global_explanation = tabular_explainer.explain_global(X_test)
|
||||
|
||||
# if the PFIExplainer in the previous step, use the next line of code instead
|
||||
# global_explanation = explainer.explain_global(x_train, true_labels=y_train)
|
||||
|
||||
# sorted feature importance values and feature names
|
||||
sorted_global_importance_values = global_explanation.get_ranked_global_values()
|
||||
sorted_global_importance_names = global_explanation.get_ranked_global_names()
|
||||
|
||||
print("Explainability feature importance:")
|
||||
# alternatively, you can print out a dictionary that holds the top K feature names and values
|
||||
global_explanation.get_feature_importance_dict()
|
||||
|
||||
client = ExplanationClient.from_run(run)
|
||||
client.upload_model_explanation(global_explanation, comment='global explanation: all features')
|
||||
|
||||
# upload dashboard to parent run
|
||||
client_parent = ExplanationClient.from_run(run.parent)
|
||||
client_parent.upload_model_explanation(global_explanation, comment='global explanation: all features')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -0,0 +1,59 @@
|
|||
|
||||
import os
|
||||
import glob
|
||||
import json
|
||||
import argparse
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import joblib
|
||||
|
||||
from azureml.core.model import Model
|
||||
|
||||
model = None
|
||||
explainer = None
|
||||
|
||||
def init():
|
||||
global model, explainer
|
||||
print("Started batch scoring by running init()")
|
||||
|
||||
parser = argparse.ArgumentParser('batch_scoring')
|
||||
parser.add_argument('--model_name', type=str, help='Model to use for batch scoring')
|
||||
args, _ = parser.parse_known_args()
|
||||
|
||||
model_path = Model.get_model_path(args.model_name)
|
||||
print(f"Model path: {model_path}")
|
||||
model = joblib.load(model_path)
|
||||
|
||||
# load the explainer
|
||||
explainer_path = os.path.join(Model.get_model_path(args.model_name), "explainer")
|
||||
#explainer = joblib.load(explainer_path)
|
||||
|
||||
def run(file_list):
|
||||
|
||||
print(f"Files to process: {file_list}")
|
||||
results = pd.DataFrame(columns=["Sno", "ProbaGoodCredit", "ProbaBadCredit", "FeatureImportance"])
|
||||
|
||||
for filename in file_list:
|
||||
|
||||
df = pd.read_csv(filename)
|
||||
sno = df["Sno"]
|
||||
df = df.drop("Sno", axis=1)
|
||||
|
||||
proba = model.predict_proba(df)
|
||||
proba = pd.DataFrame(data=proba, columns=["ProbaGoodCredit", "ProbaBadCredit"])
|
||||
|
||||
#explanation = explainer.explain_local(df)
|
||||
# sorted feature importance values and feature names
|
||||
#sorted_local_importance_names = explanation.get_ranked_local_names()
|
||||
#sorted_local_importance_values = explanation.get_ranked_local_values()
|
||||
# get explanations in dictionnary
|
||||
#explanations = []
|
||||
#for i, j in zip(sorted_local_importance_names[0], sorted_local_importance_values[0]):
|
||||
# explanations.append(dict(zip(i, j)))
|
||||
#explanation = pd.DataFrame(data=explanations, columns=["FeatureImportance"])
|
||||
|
||||
#result = pd.concat([sno, proba, explanation], axis=1)
|
||||
result = pd.concat([sno, proba], axis=1)
|
||||
results = results.append(result)
|
||||
print(f"Batch scored: {filename}")
|
||||
return results
|
|
@ -0,0 +1,106 @@
|
|||
import os
|
||||
import sys
|
||||
import argparse
|
||||
import joblib
|
||||
import pandas as pd
|
||||
|
||||
import mlflow
|
||||
import mlflow.sklearn
|
||||
|
||||
from sklearn.compose import ColumnTransformer
|
||||
from sklearn.impute import SimpleImputer
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.pipeline import Pipeline
|
||||
from sklearn.preprocessing import LabelEncoder
|
||||
from sklearn.preprocessing import OneHotEncoder
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
|
||||
from azureml.core import Run
|
||||
run = Run.get_context()
|
||||
ws = run.experiment.workspace
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="UCI Credit example")
|
||||
parser.add_argument("--transformed_data_path", type=str, default='transformed_data/', help="Directory path to training data")
|
||||
parser.add_argument("--model_path", type=str, default='trained_model/', help="Model output directory")
|
||||
return parser.parse_args()
|
||||
|
||||
def main():
|
||||
# Parse command-line arguments
|
||||
args = parse_args()
|
||||
|
||||
transformed_data_path = os.path.join(args.transformed_data_path, run.parent.id)
|
||||
model_path = os.path.join(args.model_path, run.parent.id)
|
||||
|
||||
# Make sure model output path exists
|
||||
if not os.path.exists(model_path):
|
||||
os.makedirs(model_path)
|
||||
|
||||
# Enable auto logging
|
||||
mlflow.sklearn.autolog()
|
||||
|
||||
# Read training data
|
||||
print(os.path.join(transformed_data_path, 'train.csv'))
|
||||
train = pd.read_csv(os.path.join(transformed_data_path, 'train.csv'))
|
||||
val = pd.read_csv(os.path.join(transformed_data_path, 'val.csv'))
|
||||
|
||||
run.log('TRAIN SIZE', train.shape[0])
|
||||
run.log('VAL SIZE', val.shape[0])
|
||||
|
||||
# Train model
|
||||
model = model_train(train, val)
|
||||
|
||||
#copying model to "outputs" directory, this will automatically upload it to Azure ML
|
||||
joblib.dump(value=model, filename=os.path.join(model_path, 'model.pkl'))
|
||||
|
||||
def model_train(train, val):
|
||||
|
||||
train.drop("Sno", axis=1, inplace=True)
|
||||
val.drop("Sno", axis=1, inplace=True)
|
||||
|
||||
y_train = train['Risk']
|
||||
X_train = train.drop('Risk', axis=1)
|
||||
|
||||
y_val = val['Risk']
|
||||
X_val = val.drop('Risk', axis=1)
|
||||
|
||||
categorical_features = X_train.select_dtypes(include=['object']).columns
|
||||
numeric_features = X_train.select_dtypes(include=['int64', 'float']).columns
|
||||
|
||||
categorical_transformer = Pipeline(steps=[
|
||||
('imputer', SimpleImputer(strategy='constant', fill_value="missing")),
|
||||
('onehotencoder', OneHotEncoder(categories='auto', sparse=False))])
|
||||
|
||||
numeric_transformer = Pipeline(steps=[
|
||||
('scaler', StandardScaler())])
|
||||
|
||||
feature_engineering_pipeline = ColumnTransformer(
|
||||
transformers=[
|
||||
('numeric', numeric_transformer, numeric_features),
|
||||
('categorical', categorical_transformer, categorical_features)
|
||||
], remainder="drop")
|
||||
|
||||
# Encode Labels
|
||||
le = LabelEncoder()
|
||||
encoded_y = le.fit_transform(y_train)
|
||||
|
||||
# Create sklearn pipeline
|
||||
lr_clf = Pipeline(steps=[('preprocessor', feature_engineering_pipeline),
|
||||
('classifier', LogisticRegression(solver="lbfgs"))])
|
||||
# Train the model
|
||||
lr_clf.fit(X_train, y_train)
|
||||
|
||||
# Capture metrics
|
||||
train_acc = lr_clf.score(X_train, y_train)
|
||||
val_acc = lr_clf.score(X_val, y_val)
|
||||
print("Training accuracy: %.3f" % train_acc)
|
||||
print("Validation accuracy: %.3f" % val_acc)
|
||||
|
||||
run.log('Training accuracy', train_acc)
|
||||
run.log('Validation accuracy', val_acc)
|
||||
|
||||
return lr_clf
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -0,0 +1,66 @@
|
|||
import os
|
||||
import sys
|
||||
import argparse
|
||||
import joblib
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
import mlflow
|
||||
import mlflow.sklearn
|
||||
|
||||
from azureml.core import Run
|
||||
|
||||
import argparse
|
||||
|
||||
run = Run.get_context()
|
||||
ws = run.experiment.workspace
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="UCI Credit example")
|
||||
parser.add_argument("--data_path", type=str, default='data/', help="Directory path to training data")
|
||||
parser.add_argument("--transformed_data_path", type=str, default='transformed_data/', help="transformed data directory")
|
||||
return parser.parse_args()
|
||||
|
||||
def main():
|
||||
# Parse command-line arguments
|
||||
args = parse_args()
|
||||
transformed_data_path = os.path.join(args.transformed_data_path, run.parent.id)
|
||||
|
||||
# Make sure data output path exists
|
||||
if not os.path.exists(transformed_data_path):
|
||||
os.makedirs(transformed_data_path)
|
||||
|
||||
# Enable auto logging
|
||||
mlflow.sklearn.autolog()
|
||||
|
||||
# Read training data
|
||||
df = pd.read_csv(os.path.join(args.data_path, 'credit.csv'))
|
||||
|
||||
random_data = np.random.rand(len(df))
|
||||
|
||||
msk_train = random_data < 0.7
|
||||
msk_val = (random_data >= 0.7) & (random_data < 0.85)
|
||||
msk_test = random_data >= 0.85
|
||||
|
||||
train = df[msk_train]
|
||||
val = df[msk_val]
|
||||
test = df[msk_test]
|
||||
|
||||
run.log('TRAIN SIZE', train.shape[0])
|
||||
run.log('VAL SIZE', val.shape[0])
|
||||
run.log('TEST SIZE', test.shape[0])
|
||||
|
||||
run.parent.log('TRAIN SIZE', train.shape[0])
|
||||
run.parent.log('VAL SIZE', val.shape[0])
|
||||
run.parent.log('TEST SIZE', test.shape[0])
|
||||
|
||||
TRAIN_PATH = os.path.join(transformed_data_path, "train.csv")
|
||||
VAL_PATH = os.path.join(transformed_data_path, "val.csv")
|
||||
TEST_PATH = os.path.join(transformed_data_path, "test.csv")
|
||||
|
||||
train.to_csv(TRAIN_PATH, index=False)
|
||||
val.to_csv(VAL_PATH, index=False)
|
||||
test.to_csv(TEST_PATH, index=False)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
Двоичный файл не отображается.
|
@ -0,0 +1,4 @@
|
|||
{"data": [
|
||||
[1,2,3,4,5,6,7,8,9,10],
|
||||
[10,9,8,7,6,5,4,3,2,1]
|
||||
]}
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 23 KiB |
|
@ -0,0 +1,93 @@
|
|||
targetScope='subscription'
|
||||
|
||||
param location string = 'westus2'
|
||||
param env string = 'dev'
|
||||
param prefix string
|
||||
param postfix string
|
||||
param resourceGroupName string = 'rg-wus-test'
|
||||
|
||||
|
||||
var baseName = '${prefix}${postfix}'
|
||||
|
||||
|
||||
resource resgrp 'Microsoft.Resources/resourceGroups@2020-06-01' = {
|
||||
name: resourceGroupName
|
||||
location: location
|
||||
}
|
||||
|
||||
|
||||
// storage account
|
||||
module stoacct './modules/stoacct.bicep' = {
|
||||
name: 'stoacct'
|
||||
scope: resourceGroup(resgrp.name)
|
||||
params: {
|
||||
env: env
|
||||
baseName: baseName
|
||||
location: location
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// keyvault
|
||||
module kv './modules/kv.bicep' = {
|
||||
name: 'kv'
|
||||
scope: resourceGroup(resgrp.name)
|
||||
params:{
|
||||
env: env
|
||||
location: location
|
||||
baseName: baseName
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// appinsights
|
||||
module appinsight './modules/appinsight.bicep' = {
|
||||
name: 'appinsight'
|
||||
scope: resourceGroup(resgrp.name)
|
||||
params:{
|
||||
baseName: baseName
|
||||
env: env
|
||||
location: location
|
||||
}
|
||||
}
|
||||
|
||||
// container registry
|
||||
module cr './modules/cr.bicep' = {
|
||||
name: 'cr'
|
||||
scope: resourceGroup(resgrp.name)
|
||||
params:{
|
||||
baseName: baseName
|
||||
env: env
|
||||
location: location
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// amls workspace
|
||||
module amls './modules/amls.bicep' = {
|
||||
name: 'amls'
|
||||
scope: resourceGroup(resgrp.name)
|
||||
params:{
|
||||
baseName: baseName
|
||||
env: env
|
||||
location: location
|
||||
stoacctid: stoacct.outputs.stoacctOut
|
||||
kvid: kv.outputs.kvOut
|
||||
appinsightid: appinsight.outputs.appinsightOut
|
||||
crid: cr.outputs.crOut
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// aml compute instance
|
||||
module amlci './modules/amlcomputeinstance.bicep' = {
|
||||
name: 'amlci'
|
||||
scope: resourceGroup(resgrp.name)
|
||||
params:{
|
||||
baseName: baseName
|
||||
env: env
|
||||
location: location
|
||||
workspaceName: amls.outputs.amlsName
|
||||
}
|
||||
}
|
|
@ -0,0 +1,516 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "11208633954998583577"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"defaultValue": "westus2"
|
||||
},
|
||||
"env": {
|
||||
"type": "string",
|
||||
"defaultValue": "dev"
|
||||
},
|
||||
"prefix": {
|
||||
"type": "string"
|
||||
},
|
||||
"postfix": {
|
||||
"type": "string"
|
||||
},
|
||||
"resourceGroupName": {
|
||||
"type": "string",
|
||||
"defaultValue": "rg-wus-test"
|
||||
}
|
||||
},
|
||||
"variables": {
|
||||
"baseName": "[format('{0}{1}', parameters('prefix'), parameters('postfix'))]"
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.Resources/resourceGroups",
|
||||
"apiVersion": "2020-06-01",
|
||||
"name": "[parameters('resourceGroupName')]",
|
||||
"location": "[parameters('location')]"
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-10-01",
|
||||
"name": "stoacct",
|
||||
"resourceGroup": "[parameters('resourceGroupName')]",
|
||||
"properties": {
|
||||
"expressionEvaluationOptions": {
|
||||
"scope": "inner"
|
||||
},
|
||||
"mode": "Incremental",
|
||||
"parameters": {
|
||||
"env": {
|
||||
"value": "[parameters('env')]"
|
||||
},
|
||||
"baseName": {
|
||||
"value": "[variables('baseName')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[parameters('location')]"
|
||||
}
|
||||
},
|
||||
"template": {
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "13854706444404712543"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"env": {
|
||||
"type": "string"
|
||||
},
|
||||
"baseName": {
|
||||
"type": "string"
|
||||
},
|
||||
"location": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.Storage/storageAccounts",
|
||||
"apiVersion": "2019-04-01",
|
||||
"name": "[format('{0}{1}sa', parameters('env'), parameters('baseName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"sku": {
|
||||
"name": "Standard_LRS"
|
||||
},
|
||||
"kind": "StorageV2",
|
||||
"properties": {
|
||||
"encryption": {
|
||||
"services": {
|
||||
"blob": {
|
||||
"enabled": true
|
||||
},
|
||||
"file": {
|
||||
"enabled": true
|
||||
}
|
||||
},
|
||||
"keySource": "Microsoft.Storage"
|
||||
},
|
||||
"supportsHttpsTrafficOnly": true
|
||||
}
|
||||
}
|
||||
],
|
||||
"outputs": {
|
||||
"stoacctOut": {
|
||||
"type": "string",
|
||||
"value": "[resourceId('Microsoft.Storage/storageAccounts', format('{0}{1}sa', parameters('env'), parameters('baseName')))]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dependsOn": [
|
||||
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-10-01",
|
||||
"name": "kv",
|
||||
"resourceGroup": "[parameters('resourceGroupName')]",
|
||||
"properties": {
|
||||
"expressionEvaluationOptions": {
|
||||
"scope": "inner"
|
||||
},
|
||||
"mode": "Incremental",
|
||||
"parameters": {
|
||||
"env": {
|
||||
"value": "[parameters('env')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[parameters('location')]"
|
||||
},
|
||||
"baseName": {
|
||||
"value": "[variables('baseName')]"
|
||||
}
|
||||
},
|
||||
"template": {
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "3960831692549416869"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"baseName": {
|
||||
"type": "string"
|
||||
},
|
||||
"env": {
|
||||
"type": "string"
|
||||
},
|
||||
"location": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.KeyVault/vaults",
|
||||
"apiVersion": "2019-09-01",
|
||||
"name": "[format('{0}-{1}-kv', parameters('env'), parameters('baseName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"properties": {
|
||||
"tenantId": "[subscription().tenantId]",
|
||||
"sku": {
|
||||
"name": "standard",
|
||||
"family": "A"
|
||||
},
|
||||
"accessPolicies": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"outputs": {
|
||||
"kvOut": {
|
||||
"type": "string",
|
||||
"value": "[resourceId('Microsoft.KeyVault/vaults', format('{0}-{1}-kv', parameters('env'), parameters('baseName')))]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dependsOn": [
|
||||
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-10-01",
|
||||
"name": "appinsight",
|
||||
"resourceGroup": "[parameters('resourceGroupName')]",
|
||||
"properties": {
|
||||
"expressionEvaluationOptions": {
|
||||
"scope": "inner"
|
||||
},
|
||||
"mode": "Incremental",
|
||||
"parameters": {
|
||||
"baseName": {
|
||||
"value": "[variables('baseName')]"
|
||||
},
|
||||
"env": {
|
||||
"value": "[parameters('env')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[parameters('location')]"
|
||||
}
|
||||
},
|
||||
"template": {
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "2591061638125956638"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"baseName": {
|
||||
"type": "string"
|
||||
},
|
||||
"env": {
|
||||
"type": "string"
|
||||
},
|
||||
"location": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.Insights/components",
|
||||
"apiVersion": "2020-02-02-preview",
|
||||
"name": "[format('{0}{1}-appin', parameters('env'), parameters('baseName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"kind": "web",
|
||||
"properties": {
|
||||
"Application_Type": "web"
|
||||
}
|
||||
}
|
||||
],
|
||||
"outputs": {
|
||||
"appinsightOut": {
|
||||
"type": "string",
|
||||
"value": "[resourceId('Microsoft.Insights/components', format('{0}{1}-appin', parameters('env'), parameters('baseName')))]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dependsOn": [
|
||||
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-10-01",
|
||||
"name": "cr",
|
||||
"resourceGroup": "[parameters('resourceGroupName')]",
|
||||
"properties": {
|
||||
"expressionEvaluationOptions": {
|
||||
"scope": "inner"
|
||||
},
|
||||
"mode": "Incremental",
|
||||
"parameters": {
|
||||
"baseName": {
|
||||
"value": "[variables('baseName')]"
|
||||
},
|
||||
"env": {
|
||||
"value": "[parameters('env')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[parameters('location')]"
|
||||
}
|
||||
},
|
||||
"template": {
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "12155558635582316098"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"env": {
|
||||
"type": "string"
|
||||
},
|
||||
"baseName": {
|
||||
"type": "string"
|
||||
},
|
||||
"location": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.ContainerRegistry/registries",
|
||||
"apiVersion": "2020-11-01-preview",
|
||||
"name": "[format('{0}{1}cr', parameters('env'), parameters('baseName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"sku": {
|
||||
"name": "Standard"
|
||||
},
|
||||
"properties": {
|
||||
"adminUserEnabled": true
|
||||
}
|
||||
}
|
||||
],
|
||||
"outputs": {
|
||||
"crOut": {
|
||||
"type": "string",
|
||||
"value": "[resourceId('Microsoft.ContainerRegistry/registries', format('{0}{1}cr', parameters('env'), parameters('baseName')))]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dependsOn": [
|
||||
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-10-01",
|
||||
"name": "amls",
|
||||
"resourceGroup": "[parameters('resourceGroupName')]",
|
||||
"properties": {
|
||||
"expressionEvaluationOptions": {
|
||||
"scope": "inner"
|
||||
},
|
||||
"mode": "Incremental",
|
||||
"parameters": {
|
||||
"baseName": {
|
||||
"value": "[variables('baseName')]"
|
||||
},
|
||||
"env": {
|
||||
"value": "[parameters('env')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[parameters('location')]"
|
||||
},
|
||||
"stoacctid": {
|
||||
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'stoacct')).outputs.stoacctOut.value]"
|
||||
},
|
||||
"kvid": {
|
||||
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'kv')).outputs.kvOut.value]"
|
||||
},
|
||||
"appinsightid": {
|
||||
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'appinsight')).outputs.appinsightOut.value]"
|
||||
},
|
||||
"crid": {
|
||||
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'cr')).outputs.crOut.value]"
|
||||
}
|
||||
},
|
||||
"template": {
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "18023230433604735324"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"location": {
|
||||
"type": "string"
|
||||
},
|
||||
"baseName": {
|
||||
"type": "string"
|
||||
},
|
||||
"env": {
|
||||
"type": "string"
|
||||
},
|
||||
"stoacctid": {
|
||||
"type": "string"
|
||||
},
|
||||
"kvid": {
|
||||
"type": "string"
|
||||
},
|
||||
"appinsightid": {
|
||||
"type": "string"
|
||||
},
|
||||
"crid": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.MachineLearningServices/workspaces",
|
||||
"apiVersion": "2020-09-01-preview",
|
||||
"name": "[format('{0}{1}-ws', parameters('env'), parameters('baseName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"identity": {
|
||||
"type": "SystemAssigned"
|
||||
},
|
||||
"sku": {
|
||||
"tier": "basic",
|
||||
"name": "basic"
|
||||
},
|
||||
"properties": {
|
||||
"friendlyName": "[format('{0}{1}-ws', parameters('env'), parameters('baseName'))]",
|
||||
"storageAccount": "[parameters('stoacctid')]",
|
||||
"keyVault": "[parameters('kvid')]",
|
||||
"applicationInsights": "[parameters('appinsightid')]",
|
||||
"containerRegistry": "[parameters('crid')]",
|
||||
"encryption": {
|
||||
"status": "Disabled",
|
||||
"keyVaultProperties": {
|
||||
"keyIdentifier": "",
|
||||
"keyVaultArmId": ""
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"outputs": {
|
||||
"amlsName": {
|
||||
"type": "string",
|
||||
"value": "[format('{0}{1}-ws', parameters('env'), parameters('baseName'))]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"dependsOn": [
|
||||
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'appinsight')]",
|
||||
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'cr')]",
|
||||
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'kv')]",
|
||||
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]",
|
||||
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'stoacct')]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-10-01",
|
||||
"name": "amlci",
|
||||
"resourceGroup": "[parameters('resourceGroupName')]",
|
||||
"properties": {
|
||||
"expressionEvaluationOptions": {
|
||||
"scope": "inner"
|
||||
},
|
||||
"mode": "Incremental",
|
||||
"parameters": {
|
||||
"baseName": {
|
||||
"value": "[variables('baseName')]"
|
||||
},
|
||||
"env": {
|
||||
"value": "[parameters('env')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[parameters('location')]"
|
||||
},
|
||||
"workspaceName": {
|
||||
"value": "[reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'amls')).outputs.amlsName.value]"
|
||||
}
|
||||
},
|
||||
"template": {
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"metadata": {
|
||||
"_generator": {
|
||||
"name": "bicep",
|
||||
"version": "0.5.6.12127",
|
||||
"templateHash": "2016431671585526523"
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"location": {
|
||||
"type": "string"
|
||||
},
|
||||
"baseName": {
|
||||
"type": "string"
|
||||
},
|
||||
"env": {
|
||||
"type": "string"
|
||||
},
|
||||
"computeInstanceName": {
|
||||
"type": "string",
|
||||
"defaultValue": "[format('{0}-{1}-ci', parameters('env'), parameters('baseName'))]"
|
||||
},
|
||||
"workspaceName": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.MachineLearningServices/workspaces/computes",
|
||||
"apiVersion": "2020-09-01-preview",
|
||||
"name": "[format('{0}/{1}', parameters('workspaceName'), parameters('computeInstanceName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"properties": {
|
||||
"computeType": "AmlCompute",
|
||||
"properties": {
|
||||
"vmSize": "Standard_DS3_v2",
|
||||
"subnet": "[json('null')]",
|
||||
"osType": "Linux",
|
||||
"scaleSettings": {
|
||||
"maxNodeCount": 4,
|
||||
"minNodeCount": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"dependsOn": [
|
||||
"[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', subscription().subscriptionId, parameters('resourceGroupName')), 'Microsoft.Resources/deployments', 'amls')]",
|
||||
"[subscriptionResourceId('Microsoft.Resources/resourceGroups', parameters('resourceGroupName'))]"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,25 @@
|
|||
param location string
|
||||
param baseName string
|
||||
param env string
|
||||
param computeInstanceName string = '${env}-${baseName}-ci'
|
||||
param workspaceName string
|
||||
|
||||
resource amlci 'Microsoft.MachineLearningServices/workspaces/computes@2020-09-01-preview' = {
|
||||
name: '${workspaceName}/${computeInstanceName}'
|
||||
location: location
|
||||
properties:{
|
||||
computeType: 'AmlCompute'
|
||||
properties:{
|
||||
vmSize: 'Standard_DS3_v2'
|
||||
subnet: json('null')
|
||||
osType:'Linux'
|
||||
scaleSettings:{
|
||||
maxNodeCount: 4
|
||||
minNodeCount: 0
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,39 @@
|
|||
param location string
|
||||
param baseName string
|
||||
param env string
|
||||
param stoacctid string
|
||||
param kvid string
|
||||
param appinsightid string
|
||||
param crid string
|
||||
|
||||
|
||||
|
||||
// azure machine learning service
|
||||
resource amls 'Microsoft.MachineLearningServices/workspaces@2020-09-01-preview' = {
|
||||
name: '${env}${baseName}-ws'
|
||||
location: location
|
||||
identity: {
|
||||
type: 'SystemAssigned'
|
||||
}
|
||||
sku:{
|
||||
tier: 'basic'
|
||||
name: 'basic'
|
||||
}
|
||||
properties:{
|
||||
friendlyName: '${env}${baseName}-ws'
|
||||
storageAccount: stoacctid
|
||||
keyVault: kvid
|
||||
applicationInsights: appinsightid
|
||||
containerRegistry: crid
|
||||
encryption:{
|
||||
status: 'Disabled'
|
||||
keyVaultProperties:{
|
||||
keyIdentifier: ''
|
||||
keyVaultArmId: ''
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
output amlsName string = amls.name
|
|
@ -0,0 +1,16 @@
|
|||
param baseName string
|
||||
param env string
|
||||
param location string
|
||||
|
||||
|
||||
// app insights
|
||||
resource appinsight 'Microsoft.Insights/components@2020-02-02-preview' = {
|
||||
name: '${env}${baseName}-appin'
|
||||
location: location
|
||||
kind: 'web'
|
||||
properties:{
|
||||
Application_Type: 'web'
|
||||
}
|
||||
}
|
||||
|
||||
output appinsightOut string = appinsight.id
|
|
@ -0,0 +1,17 @@
|
|||
param env string
|
||||
param baseName string
|
||||
param location string
|
||||
|
||||
resource cr 'Microsoft.ContainerRegistry/registries@2020-11-01-preview' = {
|
||||
name: '${env}${baseName}cr'
|
||||
location: location
|
||||
sku: {
|
||||
name: 'Standard'
|
||||
}
|
||||
|
||||
properties:{
|
||||
adminUserEnabled:true
|
||||
}
|
||||
}
|
||||
|
||||
output crOut string = cr.id
|
|
@ -0,0 +1,20 @@
|
|||
param baseName string
|
||||
param env string
|
||||
param location string
|
||||
|
||||
|
||||
// keyvault
|
||||
resource kv 'Microsoft.KeyVault/vaults@2019-09-01' = {
|
||||
name: '${env}-${baseName}-kv'
|
||||
location: location
|
||||
properties:{
|
||||
tenantId: subscription().tenantId
|
||||
sku: {
|
||||
name: 'standard'
|
||||
family: 'A'
|
||||
}
|
||||
accessPolicies: []
|
||||
}
|
||||
}
|
||||
|
||||
output kvOut string = kv.id
|
|
@ -0,0 +1,30 @@
|
|||
param env string
|
||||
param baseName string
|
||||
param location string
|
||||
|
||||
|
||||
// stroage account
|
||||
resource stoacct 'Microsoft.Storage/storageAccounts@2019-04-01' = {
|
||||
name: '${env}${baseName}sa'
|
||||
location: location
|
||||
sku:{
|
||||
name:'Standard_LRS'
|
||||
}
|
||||
kind: 'StorageV2'
|
||||
properties:{
|
||||
encryption:{
|
||||
services:{
|
||||
blob:{
|
||||
enabled: true
|
||||
}
|
||||
file:{
|
||||
enabled: true
|
||||
}
|
||||
}
|
||||
keySource: 'Microsoft.Storage'
|
||||
}
|
||||
supportsHttpsTrafficOnly: true
|
||||
}
|
||||
}
|
||||
|
||||
output stoacctOut string = stoacct.id
|
|
@ -0,0 +1,34 @@
|
|||
variables:
|
||||
- template: ../../../config-aml.yml
|
||||
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'main' branch: PRD environment
|
||||
- template: ../../../config-infra-prod.yml
|
||||
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'develop' or feature branches: DEV environment
|
||||
- template: ../../../config-infra-dev.yml
|
||||
|
||||
trigger:
|
||||
- none
|
||||
|
||||
pool:
|
||||
vmImage: $(ap_vm_image)
|
||||
|
||||
|
||||
stages :
|
||||
- stage: CheckOutBicepAndDeploy
|
||||
displayName: Deploy AML Workspace
|
||||
jobs:
|
||||
- job: DeployBicep
|
||||
displayName: Create Bicep Deployment
|
||||
steps:
|
||||
- checkout: self
|
||||
- task: AzureCLI@2
|
||||
displayName: Running Deployment
|
||||
inputs:
|
||||
azureSubscription: $(ado_service_connection_rg)
|
||||
scriptType: bash
|
||||
scriptLocation: inlineScript
|
||||
inlineScript: |
|
||||
az --version
|
||||
echo "deploying bicep..."
|
||||
az deployment sub create --name $(Build.DefinitionName) --location $(location) --template-file ./infrastructure/bicep/main.bicep --parameters location=$(location) resourceGroupName=$(resource_group) prefix=$(namespace) postfix=$(postfix)
|
|
@ -0,0 +1,94 @@
|
|||
# Resource group
|
||||
|
||||
module "resource_group" {
|
||||
source = "./modules/resource-group"
|
||||
|
||||
location = var.location
|
||||
|
||||
prefix = var.prefix
|
||||
postfix = var.postfix
|
||||
|
||||
tags = local.tags
|
||||
}
|
||||
|
||||
# Azure Machine Learning workspace
|
||||
|
||||
module "aml_workspace" {
|
||||
source = "./modules/aml-workspace"
|
||||
|
||||
rg_name = module.resource_group.name
|
||||
location = module.resource_group.location
|
||||
|
||||
prefix = var.prefix
|
||||
postfix = var.postfix
|
||||
|
||||
storage_account_id = module.storage_account_aml.id
|
||||
key_vault_id = module.key_vault.id
|
||||
application_insights_id = module.application_insights.id
|
||||
container_registry_id = module.container_registry.id
|
||||
|
||||
enable_aml_computecluster = var.enable_aml_computecluster
|
||||
storage_account_name = module.storage_account_aml.name
|
||||
|
||||
tags = local.tags
|
||||
}
|
||||
|
||||
# Storage account
|
||||
|
||||
module "storage_account_aml" {
|
||||
source = "./modules/storage-account"
|
||||
|
||||
rg_name = module.resource_group.name
|
||||
location = module.resource_group.location
|
||||
|
||||
prefix = var.prefix
|
||||
postfix = "${var.postfix}aml"
|
||||
|
||||
hns_enabled = false
|
||||
firewall_bypass = ["AzureServices"]
|
||||
firewall_virtual_network_subnet_ids = []
|
||||
|
||||
tags = local.tags
|
||||
}
|
||||
|
||||
# Key vault
|
||||
|
||||
module "key_vault" {
|
||||
source = "./modules/key-vault"
|
||||
|
||||
rg_name = module.resource_group.name
|
||||
location = module.resource_group.location
|
||||
|
||||
prefix = var.prefix
|
||||
postfix = var.postfix
|
||||
|
||||
tags = local.tags
|
||||
}
|
||||
|
||||
# Application insights
|
||||
|
||||
module "application_insights" {
|
||||
source = "./modules/application-insights"
|
||||
|
||||
rg_name = module.resource_group.name
|
||||
location = module.resource_group.location
|
||||
|
||||
prefix = var.prefix
|
||||
postfix = var.postfix
|
||||
|
||||
tags = local.tags
|
||||
}
|
||||
|
||||
# Container registry
|
||||
|
||||
module "container_registry" {
|
||||
source = "./modules/container-registry"
|
||||
|
||||
rg_name = module.resource_group.name
|
||||
location = module.resource_group.location
|
||||
|
||||
prefix = var.prefix
|
||||
postfix = var.postfix
|
||||
|
||||
tags = local.tags
|
||||
}
|
|
@ -0,0 +1,9 @@
|
|||
locals {
|
||||
tags = {
|
||||
Owner = "mlops-tabular"
|
||||
Project = "mlops-tabular"
|
||||
Environment = "${var.environment}"
|
||||
Toolkit = "Terraform"
|
||||
Name = "${var.prefix}"
|
||||
}
|
||||
}
|
|
@ -0,0 +1,18 @@
|
|||
terraform {
|
||||
backend "azurerm" {}
|
||||
required_providers {
|
||||
azurerm = {
|
||||
version = "= 2.99.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "azurerm" {
|
||||
features {}
|
||||
}
|
||||
|
||||
data "azurerm_client_config" "current" {}
|
||||
|
||||
data "http" "ip" {
|
||||
url = "https://ifconfig.me"
|
||||
}
|
|
@ -0,0 +1,104 @@
|
|||
resource "azurerm_machine_learning_workspace" "adl_mlw" {
|
||||
name = "mlw-${var.prefix}-${var.postfix}"
|
||||
location = var.location
|
||||
resource_group_name = var.rg_name
|
||||
application_insights_id = var.application_insights_id
|
||||
key_vault_id = var.key_vault_id
|
||||
storage_account_id = var.storage_account_id
|
||||
container_registry_id = var.container_registry_id
|
||||
|
||||
identity {
|
||||
type = "SystemAssigned"
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# Compute cluster
|
||||
|
||||
resource "azurerm_machine_learning_compute_cluster" "adl_aml_ws_compute_cluster" {
|
||||
name = "mlwcc${var.prefix}${var.postfix}"
|
||||
location = var.location
|
||||
vm_priority = "LowPriority"
|
||||
vm_size = "STANDARD_DS2_V2"
|
||||
machine_learning_workspace_id = azurerm_machine_learning_workspace.adl_mlw.id
|
||||
count = var.enable_aml_computecluster ? 1 : 0
|
||||
|
||||
scale_settings {
|
||||
min_node_count = 0
|
||||
max_node_count = 1
|
||||
scale_down_nodes_after_idle_duration = "PT120S" # 120 seconds
|
||||
}
|
||||
|
||||
identity {
|
||||
type = "SystemAssigned"
|
||||
}
|
||||
}
|
||||
|
||||
# Datastore
|
||||
|
||||
resource "azurerm_resource_group_template_deployment" "arm_aml_create_datastore" {
|
||||
name = "arm_aml_create_datastore"
|
||||
resource_group_name = var.rg_name
|
||||
deployment_mode = "Incremental"
|
||||
parameters_content = jsonencode({
|
||||
"WorkspaceName" = {
|
||||
value = azurerm_machine_learning_workspace.adl_mlw.name
|
||||
},
|
||||
"StorageAccountName" = {
|
||||
value = var.storage_account_name
|
||||
}
|
||||
})
|
||||
|
||||
depends_on = [time_sleep.wait_30_seconds]
|
||||
|
||||
template_content = <<TEMPLATE
|
||||
{
|
||||
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"WorkspaceName": {
|
||||
"type": "String"
|
||||
},
|
||||
"StorageAccountName": {
|
||||
"type": "String"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.MachineLearningServices/workspaces/datastores",
|
||||
"apiVersion": "2021-03-01-preview",
|
||||
"name": "[concat(parameters('WorkspaceName'), '/default')]",
|
||||
"dependsOn": [],
|
||||
"properties": {
|
||||
"contents": {
|
||||
"accountName": "[parameters('StorageAccountName')]",
|
||||
"containerName": "default",
|
||||
"contentsType": "AzureBlob",
|
||||
"credentials": {
|
||||
"credentialsType": "None"
|
||||
},
|
||||
"endpoint": "core.windows.net",
|
||||
"protocol": "https"
|
||||
},
|
||||
"description": "Default datastore for mlops-tabular",
|
||||
"isDefault": false,
|
||||
"properties": {
|
||||
"ServiceDataAccessAuthIdentity": "None"
|
||||
},
|
||||
"tags": {}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
TEMPLATE
|
||||
}
|
||||
|
||||
resource "time_sleep" "wait_30_seconds" {
|
||||
|
||||
depends_on = [
|
||||
azurerm_machine_learning_workspace.adl_mlw
|
||||
]
|
||||
|
||||
create_duration = "30s"
|
||||
}
|
|
@ -0,0 +1,55 @@
|
|||
variable "rg_name" {
|
||||
type = string
|
||||
description = "Resource group name"
|
||||
}
|
||||
|
||||
variable "location" {
|
||||
type = string
|
||||
description = "Location of the resource group"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
default = {}
|
||||
description = "A mapping of tags which should be assigned to the deployed resource"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for the module name"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for the module name"
|
||||
}
|
||||
|
||||
variable "storage_account_id" {
|
||||
type = string
|
||||
description = "The ID of the Storage Account linked to AML workspace"
|
||||
}
|
||||
|
||||
variable "key_vault_id" {
|
||||
type = string
|
||||
description = "The ID of the Key Vault linked to AML workspace"
|
||||
}
|
||||
|
||||
variable "application_insights_id" {
|
||||
type = string
|
||||
description = "The ID of the Application Insights linked to AML workspace"
|
||||
}
|
||||
|
||||
variable "container_registry_id" {
|
||||
type = string
|
||||
description = "The ID of the Container Registry linked to AML workspace"
|
||||
}
|
||||
|
||||
variable "enable_aml_computecluster" {
|
||||
description = "Variable to enable or disable AML compute cluster"
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "storage_account_name" {
|
||||
type = string
|
||||
description = "The Name of the Storage Account linked to AML workspace"
|
||||
}
|
|
@ -0,0 +1,8 @@
|
|||
resource "azurerm_application_insights" "adl_appi" {
|
||||
name = "appi-${var.prefix}-${var.postfix}"
|
||||
location = var.location
|
||||
resource_group_name = var.rg_name
|
||||
application_type = "web"
|
||||
|
||||
tags = var.tags
|
||||
}
|
|
@ -0,0 +1,3 @@
|
|||
output "id" {
|
||||
value = azurerm_application_insights.adl_appi.id
|
||||
}
|
|
@ -0,0 +1,25 @@
|
|||
variable "rg_name" {
|
||||
type = string
|
||||
description = "Resource group name"
|
||||
}
|
||||
|
||||
variable "location" {
|
||||
type = string
|
||||
description = "Location of the resource group"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
default = {}
|
||||
description = "A mapping of tags which should be assigned to the deployed resource"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for the module name"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for the module name"
|
||||
}
|
|
@ -0,0 +1,20 @@
|
|||
locals {
|
||||
safe_prefix = replace(var.prefix, "-", "")
|
||||
safe_postfix = replace(var.postfix, "-", "")
|
||||
}
|
||||
|
||||
resource "azurerm_container_registry" "adl_cr" {
|
||||
name = "cr${local.safe_prefix}${local.safe_postfix}"
|
||||
resource_group_name = var.rg_name
|
||||
location = var.location
|
||||
sku = "Premium"
|
||||
admin_enabled = false
|
||||
|
||||
network_rule_set {
|
||||
default_action = "Deny"
|
||||
ip_rule = []
|
||||
virtual_network = []
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
|
@ -0,0 +1,3 @@
|
|||
output "id" {
|
||||
value = azurerm_container_registry.adl_cr.id
|
||||
}
|
|
@ -0,0 +1,25 @@
|
|||
variable "rg_name" {
|
||||
type = string
|
||||
description = "Resource group name"
|
||||
}
|
||||
|
||||
variable "location" {
|
||||
type = string
|
||||
description = "Location of the resource group"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
default = {}
|
||||
description = "A mapping of tags which should be assigned to the deployed resource"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for the module name"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for the module name"
|
||||
}
|
|
@ -0,0 +1,18 @@
|
|||
data "azurerm_client_config" "current" {}
|
||||
|
||||
resource "azurerm_key_vault" "adl_kv" {
|
||||
name = "kv-${var.prefix}-${var.postfix}"
|
||||
location = var.location
|
||||
resource_group_name = var.rg_name
|
||||
tenant_id = data.azurerm_client_config.current.tenant_id
|
||||
sku_name = "standard"
|
||||
|
||||
network_acls {
|
||||
default_action = "Deny"
|
||||
ip_rules = []
|
||||
virtual_network_subnet_ids = []
|
||||
bypass = "None"
|
||||
}
|
||||
|
||||
tags = var.tags
|
||||
}
|
|
@ -0,0 +1,7 @@
|
|||
output "id" {
|
||||
value = azurerm_key_vault.adl_kv.id
|
||||
}
|
||||
|
||||
output "name" {
|
||||
value = azurerm_key_vault.adl_kv.name
|
||||
}
|
|
@ -0,0 +1,25 @@
|
|||
variable "rg_name" {
|
||||
type = string
|
||||
description = "Resource group name"
|
||||
}
|
||||
|
||||
variable "location" {
|
||||
type = string
|
||||
description = "Location of the resource group"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
default = {}
|
||||
description = "A mapping of tags which should be assigned to the deployed resource"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for the module name"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for the module name"
|
||||
}
|
|
@ -0,0 +1,5 @@
|
|||
resource "azurerm_resource_group" "adl_rg" {
|
||||
name = "rg-${var.prefix}-${var.postfix}"
|
||||
location = var.location
|
||||
tags = var.tags
|
||||
}
|
|
@ -0,0 +1,7 @@
|
|||
output "name" {
|
||||
value = azurerm_resource_group.adl_rg.name
|
||||
}
|
||||
|
||||
output "location" {
|
||||
value = azurerm_resource_group.adl_rg.location
|
||||
}
|
|
@ -0,0 +1,21 @@
|
|||
variable "location" {
|
||||
type = string
|
||||
default = "North Europe"
|
||||
description = "Location of the Resource Group"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
default = {}
|
||||
description = "A mapping of tags which should be assigned to the Resource Group"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for the module name"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for the module name"
|
||||
}
|
|
@ -0,0 +1,34 @@
|
|||
data "azurerm_client_config" "current" {}
|
||||
|
||||
data "http" "ip" {
|
||||
url = "https://ifconfig.me"
|
||||
}
|
||||
|
||||
locals {
|
||||
safe_prefix = replace(var.prefix, "-", "")
|
||||
safe_postfix = replace(var.postfix, "-", "")
|
||||
}
|
||||
|
||||
resource "azurerm_storage_account" "adl_st" {
|
||||
name = "st${local.safe_prefix}${local.safe_postfix}"
|
||||
resource_group_name = var.rg_name
|
||||
location = var.location
|
||||
account_tier = "Standard"
|
||||
account_replication_type = "LRS"
|
||||
account_kind = "StorageV2"
|
||||
is_hns_enabled = var.hns_enabled
|
||||
|
||||
tags = var.tags
|
||||
}
|
||||
|
||||
# Virtual Network & Firewall configuration
|
||||
|
||||
resource "azurerm_storage_account_network_rules" "firewall_rules" {
|
||||
resource_group_name = var.rg_name
|
||||
storage_account_name = azurerm_storage_account.adl_st.name
|
||||
|
||||
default_action = "Allow"
|
||||
ip_rules = [] # [data.http.ip.body]
|
||||
virtual_network_subnet_ids = var.firewall_virtual_network_subnet_ids
|
||||
bypass = var.firewall_bypass
|
||||
}
|
|
@ -0,0 +1,7 @@
|
|||
output "id" {
|
||||
value = azurerm_storage_account.adl_st.id
|
||||
}
|
||||
|
||||
output "name" {
|
||||
value = azurerm_storage_account.adl_st.name
|
||||
}
|
|
@ -0,0 +1,39 @@
|
|||
variable "rg_name" {
|
||||
type = string
|
||||
description = "Resource group name"
|
||||
}
|
||||
|
||||
variable "location" {
|
||||
type = string
|
||||
description = "Location of the resource group"
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
default = {}
|
||||
description = "A mapping of tags which should be assigned to the Resource Group"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for the module name"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for the module name"
|
||||
}
|
||||
|
||||
variable "hns_enabled" {
|
||||
type = bool
|
||||
description = "Hierarchical namespaces enabled/disabled"
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "firewall_virtual_network_subnet_ids" {
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "firewall_bypass" {
|
||||
default = ["None"]
|
||||
}
|
|
@ -0,0 +1,55 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
variables:
|
||||
- template: ../../../config-aml.yml
|
||||
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'main' branch: PRD environment
|
||||
- template: ../../../config-infra-prod.yml
|
||||
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'develop' or feature branches: DEV environment
|
||||
- template: ../../../config-infra-dev.yml
|
||||
|
||||
trigger:
|
||||
- none
|
||||
|
||||
pool:
|
||||
vmImage: $(ap_vm_image)
|
||||
|
||||
resources:
|
||||
repositories:
|
||||
- repository: mlops-templates
|
||||
name: Azure/mlops-templates
|
||||
endpoint: mlops-v2-tabular
|
||||
type: github
|
||||
ref: main #branch name
|
||||
|
||||
stages :
|
||||
- stage: CreateStorageAccountForTerraformState
|
||||
displayName: Create Storage for Terraform
|
||||
jobs:
|
||||
- job: CreateStorageForTerraform
|
||||
displayName: Create Storage for Terraform
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/infra/create-resource-group.yml@mlops-templates
|
||||
- template: templates/infra/create-storage-account.yml@mlops-templates
|
||||
- template: templates/infra/create-storage-container.yml@mlops-templates
|
||||
- stage: DeployAzureMachineLearningRG
|
||||
displayName: Deploy AML Resource Group
|
||||
jobs:
|
||||
- job: DeployAMLWorkspace
|
||||
displayName: 'Deploy AML Workspace'
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/infra/install-terraform.yml@mlops-templates
|
||||
- template: templates/infra/run-terraform-init.yml@mlops-templates
|
||||
- template: templates/infra/run-terraform-validate.yml@mlops-templates
|
||||
- template: templates/infra/run-terraform-plan.yml@mlops-templates
|
||||
- template: templates/infra/run-terraform-apply.yml@mlops-templates
|
|
@ -0,0 +1,23 @@
|
|||
variable "location" {
|
||||
type = string
|
||||
description = "Location of the resource group and modules"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
type = string
|
||||
description = "Prefix for module names"
|
||||
}
|
||||
|
||||
variable "environment" {
|
||||
type = string
|
||||
description = "Environment information"
|
||||
}
|
||||
|
||||
variable "postfix" {
|
||||
type = string
|
||||
description = "Postfix for module names"
|
||||
}
|
||||
|
||||
variable "enable_aml_computecluster" {
|
||||
description = "Variable to enable or disable AML compute cluster"
|
||||
}
|
|
@ -0,0 +1,78 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
variables:
|
||||
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'main' branch: PRD environment
|
||||
- template: ../../../config-infra-prod.yml
|
||||
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'develop' or feature branches: DEV environment
|
||||
- template: ../../../config-infra-dev.yml
|
||||
- name: version
|
||||
value: aml-cli-v2 #must be either 'python-sdk' or 'aml-cli-v2'
|
||||
- name: endpoint_name
|
||||
value: batchendpoint1
|
||||
- name: endpoint_type
|
||||
value: batch
|
||||
|
||||
trigger:
|
||||
- none
|
||||
|
||||
pool:
|
||||
vmImage: ubuntu-20.04
|
||||
|
||||
|
||||
resources:
|
||||
repositories:
|
||||
- repository: mlops-templates # Template Repo
|
||||
name: Azure/mlops-templates # need to change org name from "Azure" to your own org
|
||||
endpoint: mlops-v2-service-connection # need to hardcode as repositories doesn't accept variables
|
||||
type: github
|
||||
|
||||
stages:
|
||||
- stage: DeployTrainingPipeline
|
||||
displayName: Deploy Training Pipeline
|
||||
jobs:
|
||||
- job: DeployTrainingPipeline
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates
|
||||
parameters:
|
||||
pipeline_file: data-science-regression/pipeline.yml
|
||||
condition: eq(variables['build.sourceBranchName'], 'main') # Selective skipping based on branch; remove this line before release!
|
||||
|
||||
|
||||
- stage: CreateBatchEndpoint
|
||||
displayName: Create/Update Batch Endpoint
|
||||
jobs:
|
||||
- job: DeployBatchEndpoint
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-compute.yml@mlops-templates
|
||||
parameters:
|
||||
cluster_name: batch-cluster #must match cluster name in deployment file below
|
||||
min_instances: 0
|
||||
max_instances: 5
|
||||
- template: templates/${{ variables.version }}/create-endpoint.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
|
||||
parameters:
|
||||
deployment_name: mlflowdp
|
||||
deployment_file: data-science-regression/components/deploy/batch-endpoint/mlflow-deployment.yml
|
||||
# - template: templates/${{ variables.version }}/test-deployment.yml@mlops-templates
|
||||
# parameters:
|
||||
# deployment_name: blue
|
||||
# sample_request: data-science-regression/components/deploy/blue/sample-request.json
|
||||
|
||||
|
|
@ -0,0 +1,101 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
variables:
|
||||
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'main' branch: PRD environment
|
||||
- template: ../../../config-infra-prod.yml
|
||||
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'develop' or feature branches: DEV environment
|
||||
- template: ../../../config-infra-dev.yml
|
||||
- name: version
|
||||
value: aml-cli-v2 #must be either 'python-sdk' or 'aml-cli-v2'
|
||||
- name: endpoint_name
|
||||
value: onlineendpoint1
|
||||
- name: endpoint_type
|
||||
value: online
|
||||
|
||||
|
||||
trigger:
|
||||
- none
|
||||
|
||||
pool:
|
||||
vmImage: ubuntu-20.04
|
||||
|
||||
|
||||
resources:
|
||||
repositories:
|
||||
- repository: mlops-templates # Template Repo
|
||||
name: Azure/mlops-templates # need to change org name from "Azure" to your own org
|
||||
endpoint: mlops-v2-service-connection # need to hardcode as repositories doesn't accept variables
|
||||
type: github
|
||||
|
||||
stages:
|
||||
- stage: DeployTrainingPipeline
|
||||
displayName: Deploy Training Pipeline
|
||||
jobs:
|
||||
- job: DeployTrainingPipeline
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates
|
||||
parameters:
|
||||
pipeline_file: data-science-regression/pipeline.yml
|
||||
condition: eq(variables['build.sourceBranchName'], 'main') # Selective skipping based on branch; remove this line before release!
|
||||
|
||||
|
||||
- stage: CreateOnlineEndpoint
|
||||
displayName: Create/Update Online Endpoint
|
||||
jobs:
|
||||
- job: DeployOnlineEndpoint
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-endpoint.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
|
||||
parameters:
|
||||
deployment_name: blue
|
||||
deployment_file: data-science-regression/components/deploy/online-endpoint/blue/blue-deployment.yml
|
||||
- template: templates/${{ variables.version }}/test-deployment.yml@mlops-templates
|
||||
parameters:
|
||||
deployment_name: blue
|
||||
sample_request: data-science-regression/components/deploy/online-endpoint/blue/sample-request.json
|
||||
- template: templates/${{ variables.version }}/allocate-traffic.yml@mlops-templates
|
||||
parameters:
|
||||
traffic_allocation: blue=100
|
||||
|
||||
|
||||
# Example: Safe Rollout, can also be used for A/B testing
|
||||
- stage: SafeRollout
|
||||
displayName: Safe rollout of new deployment
|
||||
jobs:
|
||||
- job: SafeRolloutDeployment
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-deployment.yml@mlops-templates
|
||||
parameters:
|
||||
deployment_name: green
|
||||
deployment_file: data-science-regression/components/deploy/online-endpoint/green/green-deployment.yml
|
||||
- template: templates/${{ variables.version }}/test-deployment.yml@mlops-templates
|
||||
parameters:
|
||||
deployment_name: green
|
||||
sample_request: data-science-regression/components/deploy/online-endpoint/green/sample-request.json
|
||||
- template: templates/${{ variables.version }}/allocate-traffic.yml@mlops-templates
|
||||
parameters:
|
||||
traffic_allocation: blue=90 green=10
|
|
@ -0,0 +1,50 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
variables:
|
||||
- template: ../../../config-aml.yml
|
||||
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'main' branch: PRD environment
|
||||
- template: ../../../config-infra-prod.yml
|
||||
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'develop' or feature branches: DEV environment
|
||||
- template: ../../../config-infra-dev.yml
|
||||
- name: version
|
||||
value: python-sdk
|
||||
|
||||
trigger:
|
||||
- none
|
||||
|
||||
pool:
|
||||
vmImage: $(ap_vm_image)
|
||||
|
||||
resources:
|
||||
repositories:
|
||||
- repository: mlops-templates # Template Repo
|
||||
name: Azure/mlops-templates
|
||||
endpoint: mlops-v2-tabular # need to hardcode!
|
||||
type: github
|
||||
|
||||
stages:
|
||||
- stage: DeployBatchScoringPipeline
|
||||
displayName: Deploy Batch Scoring Pipeline
|
||||
jobs:
|
||||
- job: DeployBatchScoringPipeline
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-environment.yml@mlops-templates
|
||||
parameters:
|
||||
environment_name: $(batch_env_name)
|
||||
environment_conda_yaml: $(batch_env_conda_yaml)
|
||||
- template: templates/${{ variables.version }}/register-dataset.yml@mlops-templates
|
||||
parameters:
|
||||
data_type: scoring
|
||||
- template: templates/${{ variables.version }}/deploy-batch-scoring-pipeline.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/add-pipeline-to-endpoint.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates
|
|
@ -0,0 +1,53 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
variables:
|
||||
- template: ../../../config-aml.yml
|
||||
- ${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'main' branch: PRD environment
|
||||
- template: ../../../config-infra-prod.yml
|
||||
- ${{ if ne(variables['Build.SourceBranchName'], 'main') }}:
|
||||
# 'develop' or feature branches: DEV environment
|
||||
- template: ../../../config-infra-dev.yml
|
||||
- name: version
|
||||
value: python-sdk
|
||||
|
||||
trigger:
|
||||
- none
|
||||
|
||||
pool:
|
||||
vmImage: $(ap_vm_image)
|
||||
|
||||
resources:
|
||||
repositories:
|
||||
- repository: mlops-templates # Template Repo
|
||||
name: Azure/mlops-templates # need to change org name from Azure when pulling the template
|
||||
endpoint: mlops-v2-tabular # need to hardcode!
|
||||
type: github
|
||||
|
||||
stages:
|
||||
- stage: DeployTrainingPipeline
|
||||
displayName: Deploy Training Pipeline
|
||||
jobs:
|
||||
- job: DeployTrainingPipeline
|
||||
steps:
|
||||
- checkout: self
|
||||
path: s/
|
||||
- checkout: mlops-templates
|
||||
path: s/templates/
|
||||
- template: templates/${{ variables.version }}/install-az-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/install-aml-cli.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/connect-to-workspace.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/create-environment.yml@mlops-templates
|
||||
parameters:
|
||||
environment_name: $(training_env_name)
|
||||
environment_conda_yaml: $(training_env_conda_yaml)
|
||||
- template: templates/${{ variables.version }}/register-dataset.yml@mlops-templates
|
||||
parameters:
|
||||
data_type: training
|
||||
- template: templates/${{ variables.version }}/get-compute.yml@mlops-templates
|
||||
parameters:
|
||||
compute_type: training
|
||||
- template: templates/${{ variables.version }}/deploy-training-pipeline.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/add-pipeline-to-endpoint.yml@mlops-templates
|
||||
- template: templates/${{ variables.version }}/run-pipeline.yml@mlops-templates
|
|
@ -0,0 +1,20 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
name: mnist-batch
|
||||
channels:
|
||||
- defaults
|
||||
- anaconda
|
||||
- conda-forge
|
||||
dependencies:
|
||||
- python=3.7.5
|
||||
- pip
|
||||
- pip:
|
||||
- azureml-defaults==1.38.0
|
||||
- azureml-mlflow==1.38.0
|
||||
- azureml-sdk==1.38.0
|
||||
- azureml-interpret==1.38.0
|
||||
- scikit-learn==0.24.1
|
||||
- pandas==1.2.1
|
||||
- joblib==1.0.0
|
||||
- matplotlib==3.3.3
|
|
@ -0,0 +1,20 @@
|
|||
name: mnist-train
|
||||
channels:
|
||||
- defaults
|
||||
- anaconda
|
||||
- conda-forge
|
||||
dependencies:
|
||||
- python=3.7.5
|
||||
- pip
|
||||
- pip:
|
||||
- azureml-mlflow==1.38.0
|
||||
- azureml-sdk==1.38.0
|
||||
- scikit-learn==0.24.1
|
||||
- pandas==1.2.1
|
||||
- joblib==1.0.0
|
||||
- matplotlib==3.3.3
|
||||
- fairlearn==0.7.0
|
||||
- azureml-contrib-fairness==1.38.0
|
||||
- interpret-community==0.24.1
|
||||
- interpret-core==0.2.7
|
||||
- azureml-interpret==1.38.0
|
Загрузка…
Ссылка в новой задаче