diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md deleted file mode 100644 index 266c934..0000000 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ /dev/null @@ -1,10 +0,0 @@ ---- -name: Bug report -about: Create a report to help us improve -title: "[BUG]" -labels: '' -assignees: '' - ---- - -## Please file all bugs in the respective tutorials github pages diff --git a/.github/ISSUE_TEMPLATE/scenario_request.md b/.github/ISSUE_TEMPLATE/scenario_request.md index cdd1e95..29f0a98 100644 --- a/.github/ISSUE_TEMPLATE/scenario_request.md +++ b/.github/ISSUE_TEMPLATE/scenario_request.md @@ -1,6 +1,6 @@ --- name: Scenario request -about: Suggest an Machine Learning scenario not covered by the tutorials +about: Suggest a Machine Learning scenario not covered by the tutorials title: "[SCENARIO]" labels: '' assignees: '' diff --git a/.gitmodules b/.gitmodules index dc99b8a..f956cd8 100644 --- a/.gitmodules +++ b/.gitmodules @@ -13,3 +13,15 @@ [submodule "DeployMLModelPipelines"] path = DeployMLModelPipelines url = https://github.com/Microsoft/AMLBatchScoringPipeline +[submodule "TrainDistributedDeepModel"] + path = TrainDistributedDeepModel + url = https://github.com/Azure/DistributedDeepLearning/ +[submodule "DeployRMLModelBatch"] + path = DeployRMLModelBatch + url = https://github.com/Azure/RBatchScoring +[submodule "DeployRMLModelKubernetes"] + path = DeployRMLModelKubernetes + url = https://github.com/Azure/RealtimeRDeployment +[submodule "DeploySparkMLModelDatabricks"] + path = DeploySparkMLModelDatabricks + url = https://github.com/Azure/BatchSparkScoringPredictiveMaintenance diff --git a/DeployRMLModelBatch b/DeployRMLModelBatch new file mode 160000 index 0000000..3cd9cbd --- /dev/null +++ b/DeployRMLModelBatch @@ -0,0 +1 @@ +Subproject commit 3cd9cbdfc84b155fa4369f3b3843cb002240cdb3 diff --git a/DeployRMLModelKubernetes b/DeployRMLModelKubernetes new file mode 160000 index 0000000..90078f3 --- /dev/null +++ b/DeployRMLModelKubernetes @@ -0,0 +1 @@ +Subproject commit 90078f3eab8007ee8788a1ef5e5a49617d9c16c6 diff --git a/DeploySparkMLModelDatabricks b/DeploySparkMLModelDatabricks new file mode 160000 index 0000000..516f07f --- /dev/null +++ b/DeploySparkMLModelDatabricks @@ -0,0 +1 @@ +Subproject commit 516f07f0148deca40548589d5dada726ac20c46b diff --git a/README.md b/README.md index 35546fb..46f4e12 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ # AI Reference Architectures -This repository contains the recommended ways to train and deploy models on Azure. It ranges from running massively parallel hyperparameter tuning using Hyperdrive to deploying deep learning models on Kubernetes. Each tutorial takes you step by step through the process to train or deploy your model. The tutorials are set up as Jupyter notebooks for the Python ones and RMarkdown for the R ones so you can simply download them and start running them. For further documentation on the reference architectures please look [here](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/). +This repository contains the recommended ways to train and deploy machine learning models on Azure. It ranges from running massively parallel [hyperparameter tuning using Hyperdrive](https://github.com/Microsoft/MLHyperparameterTuning) to deploying deep learning models on [Kubernetes](https://github.com/Microsoft/AKSDeploymentTutorialAML). Each [tutorial](#tutorials) takes you step by step through the process to train or deploy your model. If you are confused about what service to use and when look at the [FAQ](#faq) below. +For further documentation on the reference architectures please look [here](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/). # Getting Started This repository is arranged as submodules and therefore you can either pull all the tutorials or simply the ones you want. @@ -21,22 +22,56 @@ if you have git older than 2.13 run: git clone --recursive https://github.com/Microsoft/AIReferenceArchitectures.git ``` -# Tutorials +# Tutorials | Tutorial | Environment | Description | Status | |----------------------------------------------|-------------|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [Deploy Deep Learning Model on Kuberenetes](https://github.com/Microsoft/AKSDeploymentTutorialAML) | Python GPU | Deploy image classification model on Kubernetes for _real-time_ scoring | [![Build Status](https://dev.azure.com/customai/AKSDeploymentTutorialAML/_apis/build/status/Microsoft.AKSDeploymentTutorialAML?branchName=master)](https://dev.azure.com/customai/AKSDeploymentTutorialAML/_build/latest?definitionId=11&branchName=master) | -| [Deploy Classic ML Model on Kubernetes](https://github.com/Microsoft/MLAKSDeployAML) | Python CPU | Deploy LightGBM model on Kubernetes for _real-time_ scoring | ![](https://dev.azure.com/customai/MLAKSDeployAMLPipeline/_apis/build/status/Microsoft.MLAKSDeployAML?branchName=master) | -| [Hyperparameter Tuning of Classical ML Models](https://github.com/Microsoft/MLHyperparameterTuning) | Python CPU | Run Hyperparameter tuning on LightGBM using Hyperdrive | ![](https://dev.azure.com/customai/MLHyperparameterTuningPipeline/_apis/build/status/Microsoft.MLHyperparameterTuning?branchName=master) | -| [Deploy Deep Learning Model on Pipelines](https://github.com/Azure/Batch-Scoring-Deep-Learning-Models-With-AML) | Python GPU | Deploy style transfer model for _batch_ scoring using Azure ML Pipelines | [![Build Status](https://dev.azure.com/customai/BatchScoringDeepLearningModelsWithAMLPipeline/_apis/build/status/Azure.Batch-Scoring-Deep-Learning-Models-With-AML?branchName=master)](https://dev.azure.com/customai/BatchScoringDeepLearningModelsWithAMLPipeline/_build/latest?definitionId=9&branchName=master) | -| [Deploy Classic ML Model on Pipelines](https://github.com/Microsoft/AMLBatchScoringPipeline) | Python CPU | Deploy one-class SVM for _batch_ scoring anomaly detection using Azure ML Pipelines | ![](https://dev.azure.com/customai/AMLBatchScoringPipeline/_apis/build/status/Microsoft.AMLBatchScoringPipeline?branchName=master) | +| [Deploy Deep Learning Model on Kubernetes](https://github.com/Microsoft/AKSDeploymentTutorialAML) | Python GPU | Deploy image classification model on Kubernetes for _real-time_ scoring | [![Build Status](https://dev.azure.com/customai/AKSDeploymentTutorialAML/_apis/build/status/Microsoft.AKSDeploymentTutorialAML?branchName=master)](https://dev.azure.com/customai/AKSDeploymentTutorialAML/_build/latest?definitionId=11&branchName=master) | +| [Deploy Classic ML Model on Kubernetes](https://github.com/Microsoft/MLAKSDeployAML) | Python CPU | Train LightGBM model locally using Azure Machine Learning, deploy on Kubernetes for _real-time_ scoring | ![](https://dev.azure.com/customai/MLAKSDeployAMLPipeline/_apis/build/status/Microsoft.MLAKSDeployAML?branchName=master) | +| [Hyperparameter Tuning of Classical ML Models](https://github.com/Microsoft/MLHyperparameterTuning) | Python CPU | Train LightGBM model locally and run Hyperparameter tuning using Hyperdrive | ![](https://dev.azure.com/customai/MLHyperparameterTuningPipeline/_apis/build/status/Microsoft.MLHyperparameterTuning?branchName=master) | +| [Deploy Deep Learning Model on Pipelines](https://github.com/Azure/Batch-Scoring-Deep-Learning-Models-With-AML) | Python GPU | Deploy PyTorch style transfer model for _batch_ scoring using Azure ML Pipelines | [![Build Status](https://dev.azure.com/customai/BatchScoringDeepLearningModelsWithAMLPipeline/_apis/build/status/Azure.Batch-Scoring-Deep-Learning-Models-With-AML?branchName=master)](https://dev.azure.com/customai/BatchScoringDeepLearningModelsWithAMLPipeline/_build/latest?definitionId=9&branchName=master) | +| [Deploy Classic ML Model on Pipelines](https://github.com/Microsoft/AMLBatchScoringPipeline) | Python CPU | Deploy one-class SVM for _batch_ scoring anomaly detection using Azure ML Pipelines | ![](https://dev.azure.com/customai/AMLBatchScoringPipeline/_apis/build/status/Microsoft.AMLBatchScoringPipeline?branchName=master) | +| [Deploy R ML Model on Kubernetes](https://github.com/Azure/RealtimeRDeployment) | R CPU | Deploy ML model for _real-time_ scoring on Kubernetes | | +| [Deploy R ML Model on Batch](https://github.com/Azure/RBatchScoring) | R CPU | Deploy forecasting model for _batch_ scoring using Azure Batch and doAzureParallel | | +| [Deploy Spark ML Model on Databricks](https://github.com/Azure/BatchSparkScoringPredictiveMaintenance) | Spark CPU | Deploy a classification model for _batch_ scoring using Databricks | | +| [Train Distributed Deep Leaning Model](https://github.com/Azure/DistributedDeepLearning/) | Python GPU | Distributed training of ResNet50 model using Batch AI | | # Requirements -The tutorials have been mainly tested on Linux VMs in Azure. They haven't been tested on Windows yet. Each tutorial may have slightly different requirements such as GPU for some of the deep learning ones. For more details please consult the readme in each tutorial. +The tutorials have been mainly tested on Linux VMs in Azure. Each tutorial may have slightly different requirements such as GPU for some of the deep learning ones. For more details please consult the readme in each tutorial. -# Reporting Issues -Please report issues with each tutorial in the tutorials own github page. +## Reporting Issues +Please report issues with each tutorial in the tutorial's own github page. + + +# FAQ +
+What service should I use for deploying models in Python? +

+ +

+ +When deploying ML models in Python there are two core questions. The first is will it be real time and whether the model is a deep learning model. For deploying deep learning models that require real time we recommend Azure Kubernetes Services (AKS) with GPUs. For a tutorial on how to do that look at [AKS w/GPU](https://github.com/Microsoft/AKSDeploymentTutorialAML). For deploying deep learning models for batch scoring we recommend using AzureML pipelines with GPUs, for a tutorial on how to do that look [AzureML Pipelines w/GPU](https://github.com/Azure/Batch-Scoring-Deep-Learning-Models-With-AML). For non deep learning models we recommend you use the same services but without GPUs. For a tutorial on deploying classical ML models for real time scoring look [AKS](https://github.com/Microsoft/MLAKSDeployAML) and for batch scoring [AzureML Pipelines](https://github.com/Microsoft/AMLBatchScoringPipeline) + +
+ +
+What service should I use to train a model in Python? +

+ +

+ +There are many options for training ML models in Python on Azure. The most straight forward way is to train your model on a [DSVM](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). You can either do this in local model straight on the VM or through attaching it in AzureML as a compute target. If you want to have AzureML manage the compute for you and scale it up and down based on whether jobs are waiting in the queue then you should AzureML Compute. + +Now if you are going to run multiple jobs for hyperparameter tuning or other purposes then we would recommend using [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), [Azure automated ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml) or AzureML Compute dependent on your requirements. +For a tutorial on how to use Hyperdrive go [here](https://github.com/Microsoft/MLHyperparameterTuning). + +
+ +## Recommend a Scenario +If there is a particular scenario you are interested in seeing a tutorial for please fill in a [scenario suggestion](https://github.com/Microsoft/AIReferenceArchitectures/issues/new?assignees=&labels=&template=scenario_request.md&title=%5BSCENARIO%5D) + +## Ongoing Work +We are constantly developing interesting AI reference architectures using Microsoft AI Platform. Some of the ongoing projects include IoT Edge scenarios, model scoring on mobile devices, add more... To follow the progress and any new reference architectures, please go to the AI section of this [link](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/). - # Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a diff --git a/TrainDistributedDeepModel b/TrainDistributedDeepModel new file mode 160000 index 0000000..d037c56 --- /dev/null +++ b/TrainDistributedDeepModel @@ -0,0 +1 @@ +Subproject commit d037c568bbd4394fbf2f668937d32122ae5a1a37 diff --git a/docs/python_scoring.md b/docs/python_scoring.md new file mode 100644 index 0000000..6c29cde --- /dev/null +++ b/docs/python_scoring.md @@ -0,0 +1,7 @@ +# Azure services for deploying Python ML + +

+ +

+ +When deploying ML models in Python there are two core questions. The first is will it be real time and whether the model is a deep learning model. For deploying deep learning models that require real time we recommend Azure Kubernetes Services (AKS) with GPUs. For a tutorial on how to do that look at [AKS w/GPU](https://github.com/Microsoft/AKSDeploymentTutorialAML). For deploying deep learning models for batch scoring we recommend using AzureML pipelines with GPUs, for a tutorial on how to do that look [AzureML Pipelines w/GPU](https://github.com/Azure/Batch-Scoring-Deep-Learning-Models-With-AML). For non deep learning models we recommend you use the same services but without GPUs. For a tutorial on deploying classical ML models for real time scoring look [AKS](https://github.com/Microsoft/MLAKSDeployAML) and for batch scoring [AzureML Pipelines](https://github.com/Microsoft/AMLBatchScoringPipeline) \ No newline at end of file diff --git a/docs/python_training.md b/docs/python_training.md new file mode 100644 index 0000000..cc934f7 --- /dev/null +++ b/docs/python_training.md @@ -0,0 +1,10 @@ +# Azure services for training Python ML models + +

+ +

+ +There are many options for training ML models in Python on Azure. The most straight forward way is to train your model on a [DSVM](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). You can either do this in local model straight on the VM or through attaching it in AzureML as a compute target. If you want to have AzureML manage the compute for you and scale it up and down based on whether jobs are waiting in the queue then you should AzureML Compute. + +Now if you are going to run multiple jobs for hyperparameter tuning or other purposes then we would recommend using [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), [Azure automated ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml) or AzureML Compute dependent on your requirements. +For a tutorial on how to use Hyperdrive go [here](https://github.com/Microsoft/MLHyperparameterTuning). \ No newline at end of file diff --git a/images/decision_python_scoring.png b/images/decision_python_scoring.png new file mode 100644 index 0000000..91534d4 Binary files /dev/null and b/images/decision_python_scoring.png differ diff --git a/images/python_training_diag.png b/images/python_training_diag.png new file mode 100644 index 0000000..cbb6188 Binary files /dev/null and b/images/python_training_diag.png differ