* README updates (#358)

* Updating environment.yml file in master (#323)

* readme updates

* mv media to scenarios folder

* fixes

* Update README.md

* simplification of language, removing redundancy

* added target audience section

* Update SETUP.md

* Update README.md

* Update environment.yml

* Update SETUP.md

* env-update (#359)

* Hyperdrive notebook updates (#356)

All tests are passing (except for unrelated AML deployment notebooks)
This commit is contained in:
JS 2019-10-09 13:17:20 -04:00 коммит произвёл GitHub
Родитель 90bf76e751
Коммит dbad9772ba
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
12 изменённых файлов: 402 добавлений и 212 удалений

100
README.md
Просмотреть файл

@ -1,20 +1,26 @@
# Computer Vision
In recent years, we see an extra-ordinary growth in Computer Vision, with applications in face recognition, image understanding, search, drones, mapping, semi-autonomous and autonomous vehicles. Key essence to many of these applications are visual recognition tasks such as image classification, object detection and image similarity. Researchers have been applying newer deep learning methods to achieve state-of-the-art(SOTA) results on these challenging visual recognition tasks.
In recent years, we've see an extra-ordinary growth in Computer Vision, with applications in face recognition, image understanding, search, drones, mapping, semi-autonomous and autonomous vehicles. A key part to many of these applications are visual recognition tasks such as image classification, object detection and image similarity.
This repository provides examples and best practice guidelines for building computer vision systems. The focus of the repository is on state-of-the-art methods that are popular among researchers and practitioners working on problems involving image recognition, object detection and image similarity.
These examples are provided as Jupyter notebooks and common utility functions. All examples use PyTorch as the deep learning library.
## Overview
The goal of this repository is to accelerate the development of computer vision applications. Rather than creating implementions from scratch, the focus is on providing examples and links to existing state-of-the-art libraries. In addition, having worked in this space for many years, we aim to answer common questions, point out frequently observed pitfalls, and show how to use the cloud for training and deployment.
This repository provides examples and best practice guidelines for building computer vision systems. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in Computer Vision algorithms, neural architectures, and operationalizing such systems. Rather than creating implementions from scratch, we draw from existing state-of-the-art libraries and build additional utility around loading image data, optimizing and evaluating models, and scaling up to the cloud. In addition, having worked in this space for many years, we aim to answer common questions, point out frequently observed pitfalls, and show how to use the cloud for training and deployment.
We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to development of solution by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.
These examples are provided as [Jupyter notebooks](scenarios) and common [utility functions](utils_cv). All examples use PyTorch as the underlying deep learning library.
## Target Audience
Our target audience for this repository includes data scientists and machine learning engineers with varying levels of Computer Vision knowledge as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world vision problems.
## Get Started
To get started, navigate to the [Setup Guide](SETUP.md), which lists
instructions on how to setup the compute environment and dependencies needed to run the
notebooks in this repo. Once your environment is setup, navigate to the
[Scenarios](scenarios) folder and start exploring the notebooks.
## Scenarios
The following is a summary of commonly used Computer Vision scenarios that are covered in this repository. For each of these scenarios, we give you the tools to effectively build your own model. This includes tasks such as fine-tuning your own model on your own data, to more complex tasks such as hard-negative mining and even model deployment. See all supported scenarios [here](scenarios).
The following is a summary of commonly used Computer Vision scenarios that are covered in this repository. For each of these scenarios, we give you the tools to effectively build your own model. This includes simple tasks such as fine-tuning your own model on your own data, to more complex tasks such as hard-negative mining and even model deployment. See all supported scenarios [here](scenarios).
| Scenario | Description |
| -------- | ----------- |
@ -22,50 +28,7 @@ The following is a summary of commonly used Computer Vision scenarios that are c
| [Similarity](scenarios/similarity) | Image Similarity is a way to compute a similarity score given a pair of images. Given an image, it allows you to identify the most similar image in a given dataset. |
| [Detection](scenarios/detection) | Object Detection is a supervised machine learning technique that allows you to detect the bounding box of an object within an image. |
## Getting Started
To get started:
1. (Optional) Create an Azure Data Science Virtual Machine with e.g. a V100 GPU ([instructions](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/provision-deep-learning-dsvm), [price table](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/)).
1. Install Anaconda with Python >= 3.6. [Miniconda](https://conda.io/miniconda.html). This step can be skipped if working on a Data Science Virtual Machine.
1. Clone the repository
```
git clone https://github.com/Microsoft/ComputerVision
```
1. Install the conda environment, you'll find the `environment.yml` file in the root directory. To build the conda environment:
> If you are using Windows, remove `- pycocotools>=2.0` from the `environment.yaml`
```
conda env create -f environment.yml
```
1. Activate the conda environment and register it with Jupyter:
```
conda activate cv
python -m ipykernel install --user --name cv --display-name "Python (cv)"
```
If you would like to use [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/), install `jupyter-webrtc` widget:
```
jupyter labextension install jupyter-webrtc
```
> If you are using Windows run at this point:
> - `pip install Cython`
> - `pip install git+https://github.com/philferriere/cocoapi.git#egg=pycocotools^&subdirectory=PythonAPI`
1. Start the Jupyter notebook server
```
jupyter notebook
```
1. At this point, you should be able to run the [notebooks](#scenarios) in this repo.
As an alternative to the steps above, and if one wants to install only
the 'utils_cv' library (without creating a new conda environment),
this can be done by running
```bash
pip install git+https://github.com/microsoft/ComputerVision.git@master#egg=utils_cv
```
or by downloading the repo and then running `pip install .` in the
root directory.
## Introduction
## Computer Vision on Azure
Note that for certain computer vision problems, you may not need to build your own models. Instead, pre-built or easily customizable solutions exist which do not require any custom coding or machine learning expertise. We strongly recommend evaluating if these can sufficiently solve your problem. If these solutions are not applicable, or the accuracy of these solutions is not sufficient, then resorting to more complex and time-consuming custom approaches may be necessary.
@ -77,8 +40,6 @@ are a set of pre-trained REST APIs which can be called for image tagging, face r
- [Custom Vision](https://azure.microsoft.com/en-us/services/cognitive-services/custom-vision-service/)
is a SaaS service to train and deploy a model as a REST API given a user-provided training set. All steps including image upload, annotation, and model deployment can be performed using either the UI or a Python SDK. Training image classification or object detection models can be achieved with minimal machine learning expertise. The Custom Vision offers more flexibility than using the pre-trained cognitive services APIs, but requires the user to bring and annotate their own data.
## Build Your Own Computer Vision Model
If you need to train your own model, the following services and links provide additional information that is likely useful.
- [Azure Machine Learning service (AzureML)](https://azure.microsoft.com/en-us/services/machine-learning-service/)
@ -87,27 +48,6 @@ is a service that helps users accelerate the training and deploying of machine l
- [Azure AI Reference architectures](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/ai/training-python-models)
provide a set of examples (backed by code) of how to build common AI-oriented workloads that leverage multiple cloud components. While not computer vision specific, these reference architectures cover several machine learning workloads such as model deployment or batch scoring.
## Computer Vision Domains
Most applications in computer vision (CV) fall into one of these 4 categories:
- **Image classification**: Given an input image, predict what object is present in the image. This is typically the easiest CV problem to solve, however classification requires objects to be reasonably large in the image.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_ic_vis.jpg" height="150" alt="Image classification visualization"/>
- **Object Detection**: Given an input image, identify and locate which objects are present (using rectangular coordinates). Object detection can find small objects in an image. Compared to image classification, both model training and manually annotating images is more time-consuming in object detection, since both the label and location are required.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_od_vis.jpg" height="150" alt="Object detect visualization"/>
- **Image Similarity** Given an input image, find all similar objects in images from a reference dataset. Here, rather than predicting a label and/or rectangle, the task is to sort through a reference dataset to find objects similar to that found in the query image.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_is_vis.jpg" height="150" alt="Image similarity visualization"/>
- **Image Segmentation** Given an input image, assign a label to every pixel (e.g., background, bottle, hand, sky, etc.). In practice, this problem is less common in industry, in large part due to time required to label the ground truth segmentation required in order to train a solution.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_iseg_vis.jpg" height="150" alt="Image segmentation visualization"/>
## Build Status
### VM Testing
@ -133,11 +73,5 @@ Most applications in computer vision (CV) fall into one of these 4 categories:
## Contributing
This project welcomes contributions and suggestions. Please see our [contribution guidelines](CONTRIBUTING.md).
## Data/Telemetry
The Azure Machine Learning image classification notebooks ([20_azure_workspace_setup](classification/notebooks/20_azure_workspace_setup.ipynb), [21_deployment_on_azure_container_instances](classification/notebooks/21_deployment_on_azure_container_instances.ipynb), [22_deployment_on_azure_kubernetes_service](classification/notebooks/22_deployment_on_azure_kubernetes_service.ipynb), [23_aci_aks_web_service_testing](classification/notebooks/23_aci_aks_web_service_testing.ipynb), and [24_exploring_hyperparameters_on_azureml](classification/notebooks/24_exploring_hyperparameters_on_azureml.ipynb)) collect browser usage data and send it to Microsoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement).
To opt out of tracking, please go to the raw `.ipynb` files and remove the following line of code (the URL will be slightly different depending on the file):
```sh
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/ComputerVision/classification/notebooks/21_deployment_on_azure_container_instances.png)"
```

121
SETUP.md Normal file
Просмотреть файл

@ -0,0 +1,121 @@
# Setup Guide
This document describes how to setup all the dependencies to run the notebooks
in this repository.
Many computer visions scenarios are extremely computationlly heavy. Training a
model often requires a machine that has a GPU, and would otherwise be too slow.
We recommend using the GPU-enabled [Azure Data Science Virtual Machine (DSVM)](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/) since it comes prepared with a lot of the prerequisites needed to efficiently do computer vision.
To scale up or to operationalize your models, we recommend setting up [Azure
ML](https://docs.microsoft.com/en-us/azure/machine-learning/). Our notebooks
provide instructions on how to use it.
## Table of Contents
1. [Compute Environment](#compute-environments)
1. [System Requirements](#system-requirements)
1. [Installation](#installation)
1. [Tunneling](#tunneling)
## Compute Environments
Most computer vision scenarios require a GPU, especially if you're training a
custom model. We recommend using a virtual machine to run the notebooks on.
Specifically, we'll want one with a powerful GPU. The NVIDIA's Tesla V100 is a
good choice that can be found in most Azure regions.
The easiest way to get started is to use the [Azure Data Science Virtual Machine (DSVM)](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). This VM will come installed with all the system requirements that are needed to run the notebooks in this repository. If you choose this option, you can skip the [System Requirements](#system-requirements) step in this guide as those requirements come pre-installed on the DSVM.
Here are some ways you can create the DSVM:
__Provision a Data Science VM with the Azure Portal or CLI__
You can also spin up a Data Science VM directly using the Azure portal. To do so, follow
[this](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro)
link that shows you how to provision your Data Science VM through the portal.
You can alternatively use the Azure command line (CLI) as well. Follow
[this](https://docs.microsoft.com/en-us/cli/azure/azure-cli-vm-tutorial?view=azure-cli-latest)
link to learn more about the Azure CLI and how it can be used to provision
resources.
__Virtual Machine Builder__
One easy way to create your DSVM is to use the [VM Builder](../contrib/vm_builder) tool located inside of the 'contrib' folder in the root directory of the repo. Simply run `python contrib/vm_builder/vm_builder.py` at the root level of the repo and this tool will preconfigure your virtual machine with the appropriate settings for working with this repository.
## System Requirement
__Requirements__
* A machine running Linux >= 16.04 LTS or Windows
* Miniconda or Anaconda with Python version >= 3.6.
* This is pre-installed on Azure DSVM such that one can run the following steps directly. To setup on your local machine, [Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a quick way to get started.
* It is recommended to update conda to the latest version: `conda update -n base -c defaults conda`
> NOTE: For Image Classification, Windows is up to 10x slower in training than Linux. You can set `num_workers=0`, but even still it will be up to 2x slower.
> NOTE: For Object Detection, Windows is about 20% slower in training but about same speed for inference.
__Dependencies__
Make sure you have CUDA Toolkit version 9.0 or above installed on your machine. You can run the command below in your terminal to check.
```
nvcc --version
```
If you don't have CUDA Toolkit or don't have the right version, please download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit)
## Installation
To install the repo and its dependencies perform the following steps:
1. Install Anaconda with Python >= 3.6. [Miniconda](https://conda.io/miniconda.html). This step can be skipped if working on a Data Science Virtual Machine.
1. Clone the repository
```
git clone https://github.com/Microsoft/ComputerVision
```
1. Install the conda environment, you'll find the `environment.yml` file in the root directory. To build the conda environment:
```
conda env create -f environment.yml
```
1. Activate the conda environment and register it with Jupyter:
```
conda activate cv
python -m ipykernel install --user --name cv --display-name "Python (cv)"
```
1. Start the Jupyter notebook server
```
jupyter notebook
```
1. At this point, you should be able to run the [notebooks](#scenarios) in this repo.
__pip install__
As an alternative to the steps above, and if you only want to install
the 'utils_cv' library (without creating a new conda environment),
this can be done using pip install:
```bash
pip install git+https://github.com/microsoft/ComputerVision.git@master#egg=utils_cv
```
> NOTE: if you install this repo using this method, you will not have the notebooks loaded by default.
## Tunneling
If your compute environment is on a VM in the cloud, you can open a tunnel from your VM to your local machine using the following command:
```
$ssh -L local_port:remote_address:remote_port <username>@<server-ip>
```
For example, if I want to run `jupyter notebook --port 8888` on my VM and I
wish to run the Jupyter notebooks on my local broswer on `localhost:9999`, I
would ssh into my VM using the following commend:
```
$ssh -L 9999:localhost:8888 <username>@<server-ip>
```
This command will allow your local machine's port 9999 to access your remote
machine's port 8888.

Просмотреть файл

@ -0,0 +1,45 @@
# Overview
| Scenario | Description |
| -------- | ----------- |
| [Classification](classification) | Image Classification is a supervised machine learning technique that allows you to learn and predict the category of a given image. |
| [Similarity](similarity) | Image Similarity is a way to compute a similarity score given a pair of images. Given an image, it allows you to identify the most similar image in a given dataset. |
| [Detection](detection) | Object Detection is a supervised machine learning technique that allows you to detect the bounding box of an object within an image. |
# Scenarios
While the field of Computer Vision is growing rapidly, the majority of vision applications fall into one of these 4 categories:
- **Image classification**: Given an input image, predict what object is present in the image. This is typically the easiest CV problem to solve, however classification requires objects to be reasonably large in the image.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_ic_vis.jpg" height="150" alt="Image classification visualization"/>
- **Object Detection**: Given an input image, identify and locate which objects are present (using rectangular coordinates). Object detection can find small objects in an image. Compared to image classification, both model training and manually annotating images is more time-consuming in object detection, since both the label and location are required.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_od_vis.jpg" height="150" alt="Object detect visualization"/>
- **Image Similarity** Given an input image, find all similar objects in images from a reference dataset. Here, rather than predicting a label and/or rectangle, the task is to sort through a reference dataset to find objects similar to that found in the query image.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_is_vis.jpg" height="150" alt="Image similarity visualization"/>
- **Image Segmentation** Given an input image, assign a label to every pixel (e.g., background, bottle, hand, sky, etc.). In practice, this problem is less common in industry, in large part due to time required to label the ground truth segmentation required in order to train a solution.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img align="center" src="./media/intro_iseg_vis.jpg" height="150" alt="Image segmentation visualization"/>
# Data/Telemetry
The following notebooks collect browser usage data and sends it to Microsoft to help improve our product and services:
- [classification/20_azure_workspace_setup](classification/20_azure_workspace_setup.ipynb)
- [classification/21_deployment_on_azure_container_instances](classification/21_deployment_on_azure_container_instances.ipynb)
- [classification/22_deployment_on_azure_kubernetes_service](classification/22_deployment_on_azure_kubernetes_service.ipynb)
- [classification/23_aci_aks_web_service_testing](classification/23_aci_aks_web_service_testing.ipynb)
- [classification/24_exploring_hyperparameters_on_azureml](classification/24_exploring_hyperparameters_on_azureml.ipynb)
- [detection/11_exploring_hyperparameters_on_azureml](detection/11_exploring_hyperparameters_on_azureml.ipynb)
Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement).
To opt out of tracking, please go to the raw `.ipynb` files and remove the following line of code (the URL will be slightly different depending on the file):
```sh
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/ComputerVision/classification/notebooks/21_deployment_on_azure_container_instances.png)"
```

Просмотреть файл

@ -33,34 +33,6 @@ We have also found that some browsers do not render Jupyter widgets correctly. I
| [23_aci_aks_web_service_testing.ipynb](23_aci_aks_web_service_testing.ipynb)| Tests the deployed models on either ACI or AKS. |
| [24_exploring_hyperparameters_on_azureml.ipynb](24_exploring_hyperparameters_on_azureml.ipynb)| Performs highly parallel parameter sweeping using AzureML's HyperDrive. |
## Using a Virtual Machine
You may want to use a virtual machine to run the notebooks. Doing so will give you a lot more flexibility -- whether it is using a GPU enabled machine or simply working in Linux.
__Data Science Virtual Machine Builder__
One easy way to create your VM is to use the 'create_dsvm.py' tool located inside of the 'tools' folder in the root directory of the repo. Simply run `python tools/create_dsvm.py` at the root level of the repo. This tool preconfigures your virtual machine with the appropriate settings for working with this repository.
__Using the Azure Portal or CLI__
You can also spin up a VM directly using the Azure portal. For this repository,
you will want to create a Data Science Virtual Machine (DSVM). To do so, follow
[this](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro)
link that shows you how to provision your VM through the portal.
You can alternatively use the Azure command line (CLI) as well. Follow
[this](https://docs.microsoft.com/en-us/cli/azure/azure-cli-vm-tutorial?view=azure-cli-latest)
link to learn more about the Azure CLI and how it can be used to provision
resources.
Once your virtual machine has been created, ssh and tunnel into the machine, then run the "Getting started" steps inside of it. The 'create_dsvm' tool will show you how to properly perform the tunneling too. If you created your virtual machine using the portal or the CLI, you can tunnel your jupyter notebook ports using the following command:
```
$ssh -L local_port:remote_address:remote_port username@server.com
```
## Azure-enhanced notebooks
Azure products and services are used in certain notebooks to enhance the efficiency of developing classification systems at scale.

Просмотреть файл

@ -41,7 +41,9 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import os\n",
@ -49,6 +51,7 @@
"from distutils.dir_util import copy_tree\n",
"import numpy as np\n",
"import scrapbook as sb\n",
"import uuid\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
@ -62,6 +65,7 @@
"import azureml.widgets as widgets\n",
"\n",
"sys.path.append(\"../../\")\n",
"from utils_cv.common.azureml import get_or_create_workspace\n",
"from utils_cv.common.data import unzip_url\n",
"from utils_cv.detection.data import Urls"
]
@ -76,7 +80,9 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%reload_ext autoreload\n",
@ -93,8 +99,9 @@
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": 3,
"metadata": {
"collapsed": true,
"tags": [
"parameters"
]
@ -108,15 +115,18 @@
"workspace_region = \"YOUR_WORKSPACE_REGION\" #Possible values eastus, eastus2, etc.\n",
"\n",
"# Choose a size for our cluster and the maximum number of nodes\n",
"VM_SIZE = \"STANDARD_NC6\" #\"STANDARD_NC6\", STANDARD_NC6S_V3\"\n",
"MAX_NODES = 10\n",
"VM_SIZE = \"STANDARD_NC6\" #STANDARD_NC6S_V3\"\n",
"MAX_NODES = 8\n",
"\n",
"# Hyperparameter grid search space\n",
"IM_MAX_SIZES = [100,200] #Default is 1333 pixels, defining small values here to speed up training\n",
"LEARNING_RATES = np.linspace(1e-2, 1e-5, 4).tolist()\n",
"IM_MAX_SIZES = [600] #Default is 1333 pixels, defining small values here to speed up training\n",
"LEARNING_RATES = [1e-4, 3e-4, 1e-3, 3e-3, 1e-2]\n",
"\n",
"# Image data\n",
"DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)"
"DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)\n",
"\n",
"# Path to utils_cv library\n",
"UTILS_DIR = os.path.join('..', '..', 'utils_cv')"
]
},
{
@ -132,22 +142,23 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from utils_cv.common.azureml import get_or_create_workspace\n",
"\n",
"ws = get_or_create_workspace(\n",
" subscription_id,\n",
" resource_group,\n",
" workspace_name,\n",
" workspace_region)\n",
" subscription_id, resource_group, workspace_name, workspace_region\n",
")\n",
"\n",
"# Print the workspace attributes\n",
"print('Workspace name: ' + ws.name, \n",
" 'Workspace region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
"print(\n",
" \"Workspace name: \" + ws.name,\n",
" \"Workspace region: \" + ws.location,\n",
" \"Subscription id: \" + ws.subscription_id,\n",
" \"Resource group: \" + ws.resource_group,\n",
" sep=\"\\n\",\n",
")"
]
},
{
@ -169,8 +180,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Found existing compute target.\n",
"{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-08-30T16:15:49.268000+00:00', 'errors': None, 'creationTime': '2019-08-30T14:31:48.860219+00:00', 'modifiedTime': '2019-08-30T14:32:04.865042+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 10, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}\n"
"Creating a new compute target...\n",
"Creating\n",
"Succeeded\n",
"AmlCompute wait for completion finished\n",
"Minimum number of nodes requested have been provisioned\n",
"{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-09-30T18:20:25.067000+00:00', 'errors': None, 'creationTime': '2019-09-30T18:18:06.217384+00:00', 'modifiedTime': '2019-09-30T18:20:38.458332+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 8, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}\n"
]
}
],
@ -180,23 +195,31 @@
"try:\n",
" # Retrieve if a compute target with the same cluster name already exists\n",
" compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)\n",
" print('Found existing compute target.')\n",
" \n",
" print(\"Found existing compute target.\")\n",
"\n",
"except ComputeTargetException:\n",
" # If it doesn't already exist, we create a new one with the name provided\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=VM_SIZE,\n",
" min_nodes=0,\n",
" max_nodes=MAX_NODES)\n",
" print(\"Creating a new compute target...\")\n",
" compute_config = AmlCompute.provisioning_configuration(\n",
" vm_size=VM_SIZE, min_nodes=0, max_nodes=MAX_NODES\n",
" )\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n",
"# we can use get_status() to get a detailed status for the current cluster. \n",
"# we can use get_status() to get a detailed status for the current cluster.\n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The compute cluster and its status can be seen in the portal. For example in the screenshot below, its automatically resizing (eventually to 0 nodes) to adjust to the number of open runs:\n",
"<img src=\"media/hyperdrive_cluster.jpg\" width=\"800\" alt=\"Compute cluster status\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -209,6 +232,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
@ -216,12 +240,10 @@
"# Retrieving default datastore that got automatically created when we setup a workspace\n",
"ds = ws.get_default_datastore()\n",
"\n",
"# We now upload the data to the 'data' folder on the Azure portal\n",
"# We now upload the data to a unique sub-folder to avoid accidentially training/evaluating also including older images.\n",
"data_subfolder = str(uuid.uuid4())\n",
"ds.upload(\n",
" src_dir=DATA_PATH,\n",
" target_path='data',\n",
" overwrite=True, # overwrite data if it already exists on the Azure blob storage\n",
" show_progress=True\n",
" src_dir=DATA_PATH, target_path=data_subfolder, overwrite=False, show_progress=True\n",
")"
]
},
@ -240,26 +262,28 @@
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Create a folder for the training script and copy the utils_cv library into that folder\n",
"script_folder = os.path.join(os.getcwd(), \"hyperdrive\")\n",
"os.makedirs(script_folder, exist_ok=True)\n",
"_ = copy_tree(os.path.join('..', '..', 'utils_cv'), os.path.join(script_folder, 'utils_cv'))"
"_ = copy_tree(UTILS_DIR, os.path.join(script_folder, 'utils_cv'))"
]
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting C:\\Users\\pabuehle\\Desktop\\ComputerVision\\detection\\notebooks\\hyperdrive/train.py\n"
"Overwriting C:\\Users\\pabuehle\\Desktop\\ComputerVision\\scenarios\\detection\\hyperdrive/train.py\n"
]
}
],
@ -282,12 +306,12 @@
"from utils_cv.common.gpu import which_processor\n",
"which_processor()\n",
"\n",
"\n",
"# Parse arguments passed by Hyperdrive\n",
"parser = argparse.ArgumentParser()\n",
"parser.add_argument('--data-folder', type=str, dest='data_dir')\n",
"parser.add_argument('--epochs', type=int, dest='epochs', default=10)\n",
"parser.add_argument('--batch_size', type=int, dest='batch_size', default=1)\n",
"parser.add_argument('--data-subfolder', type=str, dest='data_subfolder')\n",
"parser.add_argument('--epochs', type=int, dest='epochs', default=20) \n",
"parser.add_argument('--batch_size', type=int, dest='batch_size', default=2)\n",
"parser.add_argument('--learning_rate', type=float, dest='learning_rate', default=1e-4)\n",
"parser.add_argument('--min_size', type=int, dest='min_size', default=800)\n",
"parser.add_argument('--max_size', type=int, dest='max_size', default=1333)\n",
@ -303,9 +327,10 @@
"params = vars(args)\n",
"print(f\"params = {params}\")\n",
"\n",
"# Getting training and validation data\n",
"path = os.path.join(params['data_dir'], \"data\")\n",
"data = DetectionDataset(path, train_pct=0.5, batch_size = params[\"batch_size\"])\n",
"# Get training and validation data\n",
"data_path = os.path.join(params['data_dir'], params[\"data_subfolder\"])\n",
"print(f\"data_path={data_path}\")\n",
"data = DetectionDataset(data_path, train_pct=0.5, batch_size = params[\"batch_size\"])\n",
"print(\n",
" f\"Training dataset: {len(data.train_ds)} | Training DataLoader: {data.train_dl} \\n \\\n",
" Testing dataset: {len(data.test_ds)} | Testing DataLoader: {data.test_dl}\"\n",
@ -313,7 +338,7 @@
"\n",
"# Get model\n",
"model = get_pretrained_fasterrcnn(\n",
" num_classes = len(data.labels),\n",
" num_classes = len(data.labels)+1,\n",
" min_size = params[\"min_size\"],\n",
" max_size = params[\"max_size\"],\n",
" rpn_pre_nms_top_n_train = params[\"rpn_pre_nms_top_n_train\"],\n",
@ -331,9 +356,12 @@
"detector.fit(params[\"epochs\"], lr=params[\"learning_rate\"], print_freq=30)\n",
"print(f\"Average precision after each epoch: {detector.ap}\")\n",
"\n",
"# Get accuracy on test set at IOU=0.5:0.95\n",
"acc = float(detector.ap[-1])\n",
"\n",
"# Add log entries\n",
"run = Run.get_context()\n",
"run.log(\"accuracy\", float(detector.ap[-1])) # Logging our primary metric 'accuracy'\n",
"run.log(\"accuracy\", float(acc)) # Logging our primary metric 'accuracy'\n",
"run.log(\"data_dir\", params[\"data_dir\"])\n",
"run.log(\"epochs\", params[\"epochs\"])\n",
"run.log(\"batch_size\", params[\"batch_size\"])\n",
@ -362,11 +390,13 @@
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"exp = Experiment(workspace=ws, name='hyperparameter-tuning')"
"exp = Experiment(workspace=ws, name=\"hyperparameter-tuning\")"
]
},
{
@ -382,15 +412,15 @@
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Grid-search\n",
"param_sampling = GridParameterSampling( {\n",
" '--learning_rate': choice(LEARNING_RATES),\n",
" '--max_size': choice(IM_MAX_SIZES)\n",
" }\n",
"param_sampling = GridParameterSampling(\n",
" {\"--learning_rate\": choice(LEARNING_RATES), \"--max_size\": choice(IM_MAX_SIZES)}\n",
")"
]
},
@ -404,21 +434,28 @@
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"script_params = {\n",
" '--data-folder': ds.as_mount()\n",
"}\n",
"script_params = {\"--data-folder\": ds.as_mount(), \"--data-subfolder\": data_subfolder}\n",
"\n",
"est = Estimator(source_directory=script_folder,\n",
" script_params=script_params,\n",
" compute_target=compute_target,\n",
" entry_script='train.py',\n",
" use_gpu=True,\n",
" pip_packages=['nvidia-ml-py3','fastai'],\n",
" conda_packages=['scikit-learn', 'pycocotools>=2.0','torchvision==0.3','cudatoolkit==9.0'])"
"est = Estimator(\n",
" source_directory=script_folder,\n",
" script_params=script_params,\n",
" compute_target=compute_target,\n",
" entry_script=\"train.py\",\n",
" use_gpu=True,\n",
" pip_packages=[\"nvidia-ml-py3\", \"fastai\"],\n",
" conda_packages=[\n",
" \"scikit-learn\",\n",
" \"pycocotools>=2.0\",\n",
" \"torchvision==0.3\",\n",
" \"cudatoolkit==9.0\",\n",
" ],\n",
")"
]
},
{
@ -430,18 +467,20 @@
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"hyperdrive_run_config = HyperDriveConfig(\n",
" estimator=est,\n",
" hyperparameter_sampling=param_sampling,\n",
" policy=None, # Do not use any early termination \n",
" primary_metric_name='accuracy',\n",
" policy=None, # Do not use any early termination\n",
" primary_metric_name=\"accuracy\",\n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
" max_total_runs=None, # Set to none to run all possible grid parameter combinations,\n",
" max_concurrent_runs=MAX_NODES\n",
" max_total_runs=None, # Set to none to run all possible grid parameter combinations,\n",
" max_concurrent_runs=MAX_NODES,\n",
")"
]
},
@ -460,14 +499,14 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Url to hyperdrive run on the Azure portal: https://mlworkspace.azure.ai/portal/subscriptions/2ad17db4-e26d-4c9e-999e-adae9182530c/resourceGroups/pabuehle_delme2_hyperdrive/providers/Microsoft.MachineLearningServices/workspaces/pabuehle_ws/experiments/hyperparameter-tuning/runs/hyperparameter-tuning_1567193416225\n"
"Url to hyperdrive run on the Azure portal: https://mlworkspace.azure.ai/portal/subscriptions/989b90f7-da4f-41f9-84c9-44848802052d/resourceGroups/pabuehle_delme2_hyperdrive/providers/Microsoft.MachineLearningServices/workspaces/pabuehle_ws/experiments/hyperparameter-tuning/runs/hyperparameter-tuning_1569867670036119\n"
]
}
],
@ -478,13 +517,13 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c80070535f744b8aab68560b31aa38fe",
"model_id": "0f08d8354768463788969180b5d031ab",
"version_major": 2,
"version_minor": 0
},
@ -502,27 +541,27 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'runId': 'hyperparameter-tuning_1567190769563',\n",
"{'runId': 'hyperparameter-tuning_1569867670036119',\n",
" 'target': 'gpu-cluster',\n",
" 'status': 'Canceled',\n",
" 'startTimeUtc': '2019-08-30T18:46:09.79512Z',\n",
" 'endTimeUtc': '2019-08-30T19:21:47.165873Z',\n",
" 'status': 'Completed',\n",
" 'startTimeUtc': '2019-09-30T18:21:10.209419Z',\n",
" 'endTimeUtc': '2019-09-30T18:55:14.128089Z',\n",
" 'properties': {'primary_metric_config': '{\"name\": \"accuracy\", \"goal\": \"maximize\"}',\n",
" 'runTemplate': 'HyperDrive',\n",
" 'azureml.runsource': 'hyperdrive',\n",
" 'platform': 'AML',\n",
" 'baggage': 'eyJvaWQiOiAiNWFlYTJmMzAtZjQxZC00ZDA0LWJiOGUtOWU0NGUyZWQzZGQ2IiwgInRpZCI6ICI3MmY5ODhiZi04NmYxLTQxYWYtOTFhYi0yZDdjZDAxMWRiNDciLCAidW5hbWUiOiAiMDRiMDc3OTUtOGRkYi00NjFhLWJiZWUtMDJmOWUxYmY3YjQ2In0',\n",
" 'ContentSnapshotId': '348bdd53-a99f-4ddd-8ab3-a727cd12bdba'},\n",
" 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://pabuehlestorage779f8bc80.blob.core.windows.net/azureml/ExperimentRun/dcid.hyperparameter-tuning_1567190769563/azureml-logs/hyperdrive.txt?sv=2018-11-09&sr=b&sig=xLa2nd2%2BFQxDmg7tQGBScePCocDYJEayFyf9MIIPO8Y%3D&st=2019-08-30T19%3A11%3A48Z&se=2019-08-31T03%3A21%3A48Z&sp=r'}}"
" 'ContentSnapshotId': '0218d18a-3557-4fdf-8c29-8d43297621ed'},\n",
" 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://pabuehlestorage579709b90.blob.core.windows.net/azureml/ExperimentRun/dcid.hyperparameter-tuning_1569867670036119/azureml-logs/hyperdrive.txt?sv=2018-11-09&sr=b&sig=PCMArksPFcTc1rk1DMhFP6wvoZbhrpmnZbDCV8uInWw%3D&st=2019-09-30T18%3A45%3A14Z&se=2019-10-01T02%3A55%3A14Z&sp=r'}}"
]
},
"execution_count": 33,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
@ -541,7 +580,7 @@
"```\n",
"We also can cancel the Run with \n",
"```python \n",
"hyperdrive_run_config.cancel().\n",
"hyperdrive_run.cancel().\n",
"```\n",
"\n",
"Once all the child-runs are finished, we can get the best run and the metrics."
@ -549,22 +588,22 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Best Run Id:hyperparameter-tuning_1567193416225_4\n",
"* Best Run Id:hyperparameter-tuning_1569867670036119_4\n",
"Run(Experiment: hyperparameter-tuning,\n",
"Id: hyperparameter-tuning_1567193416225_4,\n",
"Id: hyperparameter-tuning_1569867670036119_4,\n",
"Type: azureml.scriptrun,\n",
"Status: Completed)\n",
"\n",
"* Best hyperparameters:\n",
"{'--data-folder': '$AZUREML_DATAREFERENCE_workspaceblobstore', '--learning_rate': '0.01', '--max_size': '200'}\n",
"Accuracy = 0.8988979153074632\n",
"{'--data-folder': '$AZUREML_DATAREFERENCE_workspaceblobstore', '--data-subfolder': '01679d79-1c47-49b8-88c3-d657f36b0c0f', '--learning_rate': '0.01', '--max_size': '600'}\n",
"Accuracy = 0.8918015856432082\n",
"Learning Rate = 0.01\n"
]
}
@ -573,7 +612,7 @@
"# Get best run and print out metrics\n",
"best_run = hyperdrive_run.get_best_run_by_primary_metric()\n",
"best_run_metrics = best_run.get_metrics()\n",
"parameter_values = best_run.get_details()['runDefinition']['arguments']\n",
"parameter_values = best_run.get_details()[\"runDefinition\"][\"arguments\"]\n",
"best_parameters = dict(zip(parameter_values[::2], parameter_values[1::2]))\n",
"\n",
"print(f\"* Best Run Id:{best_run.id}\")\n",
@ -581,7 +620,50 @@
"print(\"\\n* Best hyperparameters:\")\n",
"print(best_parameters)\n",
"print(f\"Accuracy = {best_run_metrics['accuracy']}\")\n",
"print(\"Learning Rate =\", best_run_metrics['learning_rate'])"
"print(\"Learning Rate =\", best_run_metrics[\"learning_rate\"])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'run_id': 'hyperparameter-tuning_1569867670036119_4',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.01, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.8918015856432082,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_3',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.003, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.8760658534573615,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_2',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.001, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.8282478586888209,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_1',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.0003, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.7405032357605712,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_0',\n",
" 'hyperparameters': '{\"--learning_rate\": 0.0001, \"--max_size\": 600}',\n",
" 'best_primary_metric': 0.47537724312149304,\n",
" 'status': 'Completed'},\n",
" {'run_id': 'hyperparameter-tuning_1569867670036119_preparation',\n",
" 'hyperparameters': None,\n",
" 'best_primary_metric': None,\n",
" 'status': 'Completed'}]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hyperdrive_run.get_children_sorted_by_primary_metric()"
]
},
{
@ -596,11 +678,13 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Log some outputs using scrapbook which are used during testing to verify correct notebook execution\n",
"sb.glue(\"best_accuracy\", best_run_metrics['accuracy'])"
"sb.glue(\"best_accuracy\", best_run_metrics[\"accuracy\"])"
]
}
],

Двоичные данные
scenarios/detection/media/hyperdrive_cluster.jpg Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 64 KiB

Просмотреть файл

Просмотреть файл

До

Ширина:  |  Высота:  |  Размер: 43 KiB

После

Ширина:  |  Высота:  |  Размер: 43 KiB

Просмотреть файл

До

Ширина:  |  Высота:  |  Размер: 77 KiB

После

Ширина:  |  Высота:  |  Размер: 77 KiB

Просмотреть файл

До

Ширина:  |  Высота:  |  Размер: 42 KiB

После

Ширина:  |  Высота:  |  Размер: 42 KiB

Просмотреть файл

До

Ширина:  |  Высота:  |  Размер: 66 KiB

После

Ширина:  |  Высота:  |  Размер: 66 KiB

Просмотреть файл

@ -3,6 +3,7 @@
import papermill as pm
import pytest
import scrapbook as sb
# Unless manually modified, cv should be
# the name of the current jupyter kernel
@ -11,8 +12,11 @@ KERNEL_NAME = "cv"
OUTPUT_NOTEBOOK = "output.ipynb"
# ----- Image classification ----------------------------------------------------------
@pytest.mark.azuremlnotebooks
def test_20_notebook_run(
def test_ic_20_notebook_run(
classification_notebooks,
subscription_id,
resource_group,
@ -35,7 +39,7 @@ def test_20_notebook_run(
@pytest.mark.azuremlnotebooks
def test_21_notebook_run(
def test_ic_21_notebook_run(
classification_notebooks,
subscription_id,
resource_group,
@ -60,7 +64,7 @@ def test_21_notebook_run(
@pytest.mark.azuremlnotebooks
def test_22_notebook_run(
def test_ic_22_notebook_run(
classification_notebooks,
subscription_id,
resource_group,
@ -85,7 +89,7 @@ def test_22_notebook_run(
@pytest.mark.azuremlnotebooks
def test_23_notebook_run(
def test_ic_23_notebook_run(
classification_notebooks,
subscription_id,
resource_group,
@ -108,7 +112,7 @@ def test_23_notebook_run(
@pytest.mark.azuremlnotebooks
def test_24_notebook_run(
def test_ic_24_notebook_run(
classification_notebooks,
subscription_id,
resource_group,
@ -135,4 +139,34 @@ def test_24_notebook_run(
)
# TODO add test for hyperparam object detection notebook
# # ----- Object detection ----------------------------------------------------------
@pytest.mark.azuremlnotebooks
def test_od_11_notebook_run(
detection_notebooks,
subscription_id,
resource_group,
workspace_name,
workspace_region,
):
notebook_path = detection_notebooks["11"]
pm.execute_notebook(
notebook_path,
OUTPUT_NOTEBOOK,
parameters=dict(
PM_VERSION=pm.__version__,
subscription_id=subscription_id,
resource_group=resource_group,
workspace_name=workspace_name,
workspace_region=workspace_region,
MAX_NODES=3,
IM_MAX_SIZES=[200],
LEARNING_RATES=[1e-5, 3e-3],
UTILS_DIR="utils_cv",
),
kernel_name=KERNEL_NAME,
)
nb_output = sb.read_notebook(OUTPUT_NOTEBOOK)
assert nb_output.scraps["best_accuracy"].data > 0.70