additional notebooks (#159)
* Adjusted notebooks 21 and 22 to reflect new separate web service testing and workspace creation notebooks * Added pre-reqs section back + changed variable names to lower case + removed amount of money spent + adjusted table of content and sections numbering * Replaced from_config() by setup() * multilabel notebook (#133) * multilabel notebook * multilabell notebook * add python file * testS * flake8 & black updates * update multilabel notebook and utils * 01/03 notebooks * update 01 and 02 to fast inference params * added multilabel support to 03_notebook * fix tests * update to multilabel * Hard negative sampling notebook (#132) * Initial refactor - All python scripts under utils_cv are clean - All files under classification (notebooks and tools-scripts) and tests need fix * refactor gpu util * datapath * test refactor * widgets * fix tests. fix 00 10 21 notebooks * Update 22 notebook. Add nbconverted files * python version * Root readme.md * faq * change test filenames otherwise they fail * conftest update * Fix result widget import * Fix 01 notebook widget import * Refactor widget.py and misc.py * negative mining notebook * Data loading * data loading * Initial training result * Add results * nb convertion * add example images * image rotation fix * Update notebook with new data * Fix typo * Update images * Revise reviews * update images * update notebook with new figures * Notebook update, Add tests * Add test docstring * Changed text + removed testing section + reduced table of content + put intro, pre-reqs and library import together + removed reference to user interface in note # 23 * update to readme, fixing link and typo (#158) * Update README.md (#157) forcing merge as it is just minor updates on the readme.
This commit is contained in:
Родитель
59d684e561
Коммит
ee7a7a5ec6
Двоичный файл не отображается.
|
@ -20,14 +20,14 @@ Currently the main investment/priority is around image classification and to a l
|
|||
|
||||
## Getting Started
|
||||
|
||||
Instructions on how to get started, as well as our example notebooks and discussions are provided in the [image classification](image_classification/README.md) subfolder.
|
||||
Instructions on how to get started, as well as our example notebooks and discussions are provided in the [classification](classification/README.md) subfolder.
|
||||
|
||||
Note that for certain Computer Vision problems, ready-made or easily customizable solutions exist which do not require any custom coding or machine learning expertise. We strongly recommend evaluating if these can sufficiently solve your problem. If these solutions are not applicable, or the accuracy of these solutions is not sufficient, then resorting to more complex and time-consuming custom approaches may be necessary.
|
||||
|
||||
The following Microsoft services offer simple solutions to address common Computer Vision tasks:
|
||||
|
||||
- [Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/directory/vision/)
|
||||
provides pre-trained REST APIs which can be called for image classification, face recognition, OCR, video analytics, and much more. These APIs are easy to use and work out of the box (no training required), however customization is limited. See the various demos available for each domain to get a feel for the functionality (e.g., [computer vision](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/), [speach to text](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/) ).
|
||||
provides pre-trained REST APIs which can be called for image classification, face recognition, OCR, video analytics, and much more. These APIs are easy to use and work out of the box (no training required), however customization is limited. See the various demos available for each domain to get a feel for the functionality (e.g., [computer vision](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/), [speech to text](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/) ).
|
||||
|
||||
- [Custom Vision Service](https://azure.microsoft.com/en-us/services/cognitive-services/custom-vision-service/)
|
||||
is a SaaS service to train and deploy a model as a REST API given a user-provided training set. All steps from image upload, annotation, to model deployment can be performed using either the UI or a Python SDK. Training image classification or object detection models are supported using only minimal machine learning knowledge. The Custom Vision Service offers more flexibility than using the pre-trained Cognitive Services APIs, but requires the user to bring and annotate their own data.
|
||||
|
|
|
@ -20,10 +20,15 @@ We have also found that some browsers do not render Jupyter widgets correctly. I
|
|||
| --- | --- |
|
||||
| [00_webcam.ipynb](notebooks/00_webcam.ipynb)| Demonstrates a trained model and inference an image from your computer's webcam.
|
||||
| [01_training_introduction.ipynb](notebooks/01_training_introduction.ipynb)| Introduces some of the basic concepts around model training and evaluation.|
|
||||
| [02_training_accuracy_vs_speed.ipynb](notebooks/02_training_accuracy_vs_speed.ipynb)| Trains a model with high accuracy vs one with a fast inferencing speed. *<font color="orange"> Use this to train on your own datasets! </font>* |
|
||||
| [02_multilabel_classification.ipynb](notebooks/02_multilabel_classification.ipynb)| Introduces the key differences when it comes to training a multilabel classification model.|
|
||||
| [03_training_accuracy_vs_speed.ipynb](notebooks/03_training_accuracy_vs_speed.ipynb)| Trains a model with high accuracy vs one with a fast inferencing speed. *<font color="orange"> Use this to train on your own datasets! </font>* |
|
||||
| [10_image_annotation.ipynb](notebooks/10_image_annotation.ipynb)| A simple UI to annotate images. |
|
||||
| [11_exploring_hyperparameters.ipynb](notebooks/11_exploring_hyperparameters.ipynb)| Finds optimal model parameters using grid search. |
|
||||
| [21_deployment_on_azure_container_instances.ipynb](notebooks/21_deployment_on_azure_container_instances.ipynb)| Deploys a trained model as REST API using Azure Container Instances. |
|
||||
| [12_hard_negative_sampling.ipynb](notebooks/12_hard_negative_sampling.ipynb)| Use hard negatives to improve your model performance. |
|
||||
| [20_azure_workspace_setup.ipynb](notebooks/20_azure_workspace_setup.ipynb)| Setup your Azure resources and Azure Machine Learning workspace. |
|
||||
| [21_deployment_on_azure_container_instances.ipynb](notebooks/21_deployment_on_azure_container_instances.ipynb)| Deploys a trained model exposed on a REST API using Azure Container Instances. |
|
||||
| [22_deployment_on_azure_kubernetes_service.ipynb](notebooks/22_deployment_on_azure_kubernetes_service.ipynb)| Deploys a trained model exposed on a REST API using the Azure Kubernetes Service. |
|
||||
| [23_aci_aks_web_service_testing.ipynb](notebooks/23_aci_aks_web_service_testing.ipynb)| Test the deployed models on either ACI or AKS. |
|
||||
|
||||
## Getting Started
|
||||
|
||||
|
|
|
@ -49,7 +49,8 @@
|
|||
"output_type": "stream",
|
||||
"text": [
|
||||
"Fast.ai version = 1.0.48\n",
|
||||
"Cuda is not available. Fast.ai/Torch is using CPU\n"
|
||||
"Fast.ai (Torch) is using GPU: Tesla V100-PCIE-16GB\n",
|
||||
"Available / Total memory = 8514 / 16130 (MiB)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
@ -104,9 +105,9 @@
|
|||
"DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)\n",
|
||||
"EPOCHS = 5\n",
|
||||
"LEARNING_RATE = 1e-4\n",
|
||||
"IMAGE_SIZE = 299\n",
|
||||
"IM_SIZE = 300\n",
|
||||
"BATCH_SIZE = 16\n",
|
||||
"ARCHITECTURE = models.resnet50"
|
||||
"ARCHITECTURE = models.resnet18"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -130,10 +131,11 @@
|
|||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[PosixPath('/Users/jehrling/Documents/GitHub/ComputerVision/data/fridgeObjects/milk_bottle'),\n",
|
||||
" PosixPath('/Users/jehrling/Documents/GitHub/ComputerVision/data/fridgeObjects/water_bottle'),\n",
|
||||
" PosixPath('/Users/jehrling/Documents/GitHub/ComputerVision/data/fridgeObjects/can'),\n",
|
||||
" PosixPath('/Users/jehrling/Documents/GitHub/ComputerVision/data/fridgeObjects/carton')]"
|
||||
"[PosixPath('/home/jiata/code/cvbp/data/fridgeObjects/models'),\n",
|
||||
" PosixPath('/home/jiata/code/cvbp/data/fridgeObjects/milk_bottle'),\n",
|
||||
" PosixPath('/home/jiata/code/cvbp/data/fridgeObjects/carton'),\n",
|
||||
" PosixPath('/home/jiata/code/cvbp/data/fridgeObjects/water_bottle'),\n",
|
||||
" PosixPath('/home/jiata/code/cvbp/data/fridgeObjects/can')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
|
@ -194,7 +196,7 @@
|
|||
"data = (ImageList.from_folder(path) \n",
|
||||
" .split_by_rand_pct(valid_pct=0.2, seed=10) \n",
|
||||
" .label_from_folder() \n",
|
||||
" .transform(size=IMAGE_SIZE) \n",
|
||||
" .transform(size=IM_SIZE) \n",
|
||||
" .databunch(bs=BATCH_SIZE) \n",
|
||||
" .normalize(imagenet_stats))"
|
||||
]
|
||||
|
@ -276,14 +278,14 @@
|
|||
"Image (3, 299, 299),Image (3, 299, 299),Image (3, 299, 299),Image (3, 299, 299),Image (3, 299, 299)\n",
|
||||
"y: CategoryList\n",
|
||||
"milk_bottle,milk_bottle,milk_bottle,milk_bottle,milk_bottle\n",
|
||||
"Path: /Users/jehrling/Documents/GitHub/ComputerVision/data/fridgeObjects;\n",
|
||||
"Path: /home/jiata/code/cvbp/data/fridgeObjects;\n",
|
||||
"\n",
|
||||
"Valid: LabelList (26 items)\n",
|
||||
"x: ImageList\n",
|
||||
"Image (3, 299, 299),Image (3, 299, 299),Image (3, 299, 299),Image (3, 299, 299),Image (3, 299, 299)\n",
|
||||
"y: CategoryList\n",
|
||||
"water_bottle,water_bottle,water_bottle,carton,carton\n",
|
||||
"Path: /Users/jehrling/Documents/GitHub/ComputerVision/data/fridgeObjects;\n",
|
||||
"carton,carton,carton,can,can\n",
|
||||
"Path: /home/jiata/code/cvbp/data/fridgeObjects;\n",
|
||||
"\n",
|
||||
"Test: None>"
|
||||
]
|
||||
|
|
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
|
@ -51,7 +51,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Training a High Accuracy or a Fast Inference Speed Classifier <a name=\"model\"></a>"
|
||||
"# Training a High Accuracy, Fast Inference Speed, or Small Size Classifier <a name=\"model\"></a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -91,7 +91,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -109,7 +109,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -117,8 +117,9 @@
|
|||
"sys.path.append(\"../../\")\n",
|
||||
"import os\n",
|
||||
"from pathlib import Path\n",
|
||||
"from utils_cv.classification.data import Urls\n",
|
||||
"from utils_cv.classification.data import Urls, is_data_multilabel\n",
|
||||
"from utils_cv.common.data import unzip_url\n",
|
||||
"from utils_cv.classification.model import hamming_accuracy\n",
|
||||
"from fastai.vision import *\n",
|
||||
"from fastai.metrics import accuracy"
|
||||
]
|
||||
|
@ -143,14 +144,14 @@
|
|||
"source": [
|
||||
"For most scenarios, computer vision practitioners want to create a high accuracy model, a fast-inference model or a small size model. Set your `MODEL_TYPE` variable to one of the following: `\"high_accuracy\"`, `\"fast_inference\"`, or `\"small_size\"`.\n",
|
||||
"\n",
|
||||
"we will again use the `FridgeObjects` dataset from [previous notebook](01_training_introduction.ipynb). You can replace the `DATA_PATH` variable with your own data.\n",
|
||||
"We will use the `FridgeObjects` dataset from [previous notebook](01_training_introduction.ipynb) again. You can replace the `DATA_PATH` variable with your own data.\n",
|
||||
"\n",
|
||||
"When choosing the batch size, remember that even mid-level GPUs run out of memory when training a deeper resnet models with larger image resolutions. If you get an _out of memory_ error, try reducing the batch size by a factor of 2."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"parameters"
|
||||
|
@ -180,7 +181,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -196,7 +197,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -213,6 +214,25 @@
|
|||
" IM_SIZE = 300 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We'll determine if your dataset is a multilabel or traditional classification problem. To do so, we'll use the `is_data_multilabel` helper function. In order to detect whether or not a dataset is multilabel, the helper function will check to see if the datapath contains a csv file that has a column 'labels' where the values are space-delimited. You can inspect the function by calling `is_data_multilabel??`.\n",
|
||||
"\n",
|
||||
"This function assumes that your multilabel dataset is structured in recommended format shown in the [multilabel notebook](02_multilabel_classification.ipynb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"multilabel = is_data_multilabel(DATA_PATH)\n",
|
||||
"metric = accuracy if not multilabel else hamming_accuracy"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
@ -255,11 +275,26 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = (ImageList.from_folder(Path(DATA_PATH)) \n",
|
||||
" .split_by_rand_pct(valid_pct=0.2, seed=10) \n",
|
||||
" .label_from_folder() \n",
|
||||
"label_list = (\n",
|
||||
" (ImageList.from_folder(Path(DATA_PATH)) \n",
|
||||
" .split_by_rand_pct(valid_pct=0.2, seed=10) \n",
|
||||
" .label_from_folder())\n",
|
||||
" if not multilabel else\n",
|
||||
" (ImageList.from_csv(Path(DATA_PATH), 'labels.csv', folder='images')\n",
|
||||
" .split_by_rand_pct(valid_pct=0.2, seed=10)\n",
|
||||
" .label_from_df(label_delim=' '))\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = (label_list\n",
|
||||
" .transform(tfms=get_transforms(), size=IM_SIZE) \n",
|
||||
" .databunch(bs=16) \n",
|
||||
" .databunch(bs=BATCH_SIZE) \n",
|
||||
" .normalize(imagenet_stats))"
|
||||
]
|
||||
},
|
||||
|
@ -272,11 +307,11 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"learn = cnn_learner(data, ARCHITECTURE, metrics=accuracy)"
|
||||
"learn = cnn_learner(data, ARCHITECTURE, metrics=metric)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -288,7 +323,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
|
@ -307,31 +342,31 @@
|
|||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>1.710471</td>\n",
|
||||
" <td>1.451503</td>\n",
|
||||
" <td>0.269231</td>\n",
|
||||
" <td>00:41</td>\n",
|
||||
" <td>1.462767</td>\n",
|
||||
" <td>1.473022</td>\n",
|
||||
" <td>0.307692</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>1.804611</td>\n",
|
||||
" <td>1.420726</td>\n",
|
||||
" <td>0.269231</td>\n",
|
||||
" <td>00:47</td>\n",
|
||||
" <td>1.497195</td>\n",
|
||||
" <td>1.328375</td>\n",
|
||||
" <td>0.423077</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1.704576</td>\n",
|
||||
" <td>1.411111</td>\n",
|
||||
" <td>0.346154</td>\n",
|
||||
" <td>00:43</td>\n",
|
||||
" <td>1.479020</td>\n",
|
||||
" <td>1.268853</td>\n",
|
||||
" <td>0.461538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>1.665398</td>\n",
|
||||
" <td>1.410190</td>\n",
|
||||
" <td>0.346154</td>\n",
|
||||
" <td>00:41</td>\n",
|
||||
" <td>1.455920</td>\n",
|
||||
" <td>1.246885</td>\n",
|
||||
" <td>0.500000</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>"
|
||||
|
@ -357,7 +392,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -373,13 +408,13 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"Total time: 00:19 <p><table border=\"1\" class=\"dataframe\">\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: left;\">\n",
|
||||
" <th>epoch</th>\n",
|
||||
|
@ -392,86 +427,86 @@
|
|||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>1.623785</td>\n",
|
||||
" <td>1.409176</td>\n",
|
||||
" <td>0.307692</td>\n",
|
||||
" <td>1.515685</td>\n",
|
||||
" <td>1.212074</td>\n",
|
||||
" <td>0.461538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>1.457880</td>\n",
|
||||
" <td>1.230384</td>\n",
|
||||
" <td>0.423077</td>\n",
|
||||
" <td>1.477861</td>\n",
|
||||
" <td>1.070968</td>\n",
|
||||
" <td>0.615385</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>1.346284</td>\n",
|
||||
" <td>0.825346</td>\n",
|
||||
" <td>0.769231</td>\n",
|
||||
" <td>1.328198</td>\n",
|
||||
" <td>0.816114</td>\n",
|
||||
" <td>0.692308</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>1.222301</td>\n",
|
||||
" <td>0.543954</td>\n",
|
||||
" <td>0.884615</td>\n",
|
||||
" <td>1.141497</td>\n",
|
||||
" <td>0.596160</td>\n",
|
||||
" <td>0.769231</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>4</td>\n",
|
||||
" <td>1.059379</td>\n",
|
||||
" <td>0.393587</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>0.980073</td>\n",
|
||||
" <td>0.472985</td>\n",
|
||||
" <td>0.884615</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>0.920777</td>\n",
|
||||
" <td>0.315344</td>\n",
|
||||
" <td>0.850739</td>\n",
|
||||
" <td>0.347233</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>6</td>\n",
|
||||
" <td>0.807599</td>\n",
|
||||
" <td>0.258829</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.757358</td>\n",
|
||||
" <td>0.288975</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>7</td>\n",
|
||||
" <td>0.712808</td>\n",
|
||||
" <td>0.239849</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.677034</td>\n",
|
||||
" <td>0.268191</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>8</td>\n",
|
||||
" <td>0.634236</td>\n",
|
||||
" <td>0.231437</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.601296</td>\n",
|
||||
" <td>0.253871</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>9</td>\n",
|
||||
" <td>0.570075</td>\n",
|
||||
" <td>0.237903</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.536131</td>\n",
|
||||
" <td>0.249480</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>10</td>\n",
|
||||
" <td>0.511892</td>\n",
|
||||
" <td>0.240423</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.482381</td>\n",
|
||||
" <td>0.246853</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>11</td>\n",
|
||||
" <td>0.470356</td>\n",
|
||||
" <td>0.234572</td>\n",
|
||||
" <td>1.000000</td>\n",
|
||||
" <td>0.435568</td>\n",
|
||||
" <td>0.242366</td>\n",
|
||||
" <td>0.961538</td>\n",
|
||||
" <td>00:01</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
|
@ -501,7 +536,7 @@
|
|||
"metadata": {},
|
||||
"source": [
|
||||
"In this section, we test our model on the following characteristics:\n",
|
||||
"- accuracy\n",
|
||||
"- performance\n",
|
||||
"- inference speed\n",
|
||||
"- parameter export size / memory footprint required\n",
|
||||
"\n",
|
||||
|
@ -513,26 +548,26 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Accuracy \n",
|
||||
"To keep things simple, we just a look at the final accuracy on the validation set."
|
||||
"### Performance \n",
|
||||
"To keep things simple, we just a look at the final evaluation metric on the validation set."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Accuracy on validation set: 1.0\n"
|
||||
"accuracy on validation set: 0.9615384340286255\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"_, metric = learn.validate(learn.data.valid_dl, metrics=[accuracy])\n",
|
||||
"print(f'Accuracy on validation set: {float(metric)}')"
|
||||
"_, score = learn.validate(learn.data.valid_dl, metrics=[metric])\n",
|
||||
"print(f'{metric.__name__} on validation set: {float(score)}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -546,23 +581,24 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"im = open_image(f\"{(Path(DATA_PATH)/learn.data.classes[0]).ls()[0]}\")"
|
||||
"im_folder = learn.data.classes[0] if not multilabel else 'images'\n",
|
||||
"im = open_image(f\"{(Path(DATA_PATH)/im_folder).ls()[0]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"12.6 ms ± 375 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
|
||||
"12.3 ms ± 47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
@ -582,7 +618,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -591,7 +627,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
Различия файлов скрыты, потому что одна или несколько строк слишком длинны
|
@ -9,31 +9,25 @@
|
|||
"<i>Licensed under the MIT License.</i>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Deployment of a model as a service with Azure Kubernetes Service\n",
|
||||
"# Deployment of a model to Azure Kubernetes Service (AKS)\n",
|
||||
"\n",
|
||||
"## Table of contents\n",
|
||||
"1. [Introduction](#intro)\n",
|
||||
"1. [Pre-requisites](#pre-reqs)\n",
|
||||
"1. [Library import](#libraries)\n",
|
||||
"1. [Azure workspace](#workspace)\n",
|
||||
"1. [Model deployment on AKS](#deploy)\n",
|
||||
" 1. [Workspace retrieval](#workspace)\n",
|
||||
" 1. [Docker image retrieval](#docker_image)\n",
|
||||
" 1. [AKS compute target creation](#compute)\n",
|
||||
" 1. [Monitoring activation](#monitor)\n",
|
||||
" 1. [Service deployment](#svc_deploy)\n",
|
||||
"1. [Testing of the web service](#testing)\n",
|
||||
"1. [Clean up](#clean)\n",
|
||||
" 1. [Monitoring deactivation and service deletion](#insights)\n",
|
||||
" 1. [Workspace deletion](#del_workspace)\n",
|
||||
"1. [Next steps](#next)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## 1. Introduction <a id=\"intro\"/>\n",
|
||||
"\n",
|
||||
"In many real life scenarios, trained machine learning models need to be deployed to production. As we saw in the [first](21_deployment_on_azure_container_instances.ipynb) deployment notebook, this can be done by deploying on Azure Container Instances. In this tutorial, we will get familiar with another way of implementing a model into a production environment, this time using [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads) (AKS).\n",
|
||||
"\n",
|
||||
"AKS manages hosted Kubernetes environments. It makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters. The latter have been shown to be [more economical and efficient](https://azure.microsoft.com/en-us/blog/gpus-vs-cpus-for-deployment-of-deep-learning-models/) when serving complex models such as deep neural networks, and/or when traffic to the web service is high (> 100 requests/second).\n",
|
||||
"In many real life scenarios, trained machine learning models need to be deployed to production. As we saw in the [prior](21_deployment_on_azure_container_instances.ipynb) deployment notebook, this can be done by deploying on Azure Container Instances. In this tutorial, we will get familiar with another way of implementing a model into a production environment, this time using [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads) (AKS).\n",
|
||||
"\n",
|
||||
"AKS manages hosted Kubernetes environments. It makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters.\n",
|
||||
"\n",
|
||||
"At the end of this tutorial, we will have learned how to:\n",
|
||||
"\n",
|
||||
|
@ -45,27 +39,22 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Pre-requisites <a id=\"pre-reqs\"/>\n",
|
||||
"### Pre-requisites <a id=\"pre-reqs\"/>\n",
|
||||
"\n",
|
||||
"This notebook relies on resources we created in [21_deployment_on_azure_container_instances.ipynb](21_deployment_on_azure_container_instances.ipynb):\n",
|
||||
"- Our local conda environment and Azure Machine Learning workspace\n",
|
||||
"- Our Azure Machine Learning workspace\n",
|
||||
"- The Docker image that contains the model and scoring script needed for the web service to work.\n",
|
||||
"\n",
|
||||
"If we are missing any of these, we should go back and run the steps from the sections \"2. Pre-requisites\" to \"6.C Environment setup\" to generate them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Library import <a id=\"libraries\"/>\n",
|
||||
"If we are missing any of these, we should go back and run the steps from the sections \"Pre-requisites\" to \"3.D Environment setup\" to generate them.\n",
|
||||
"\n",
|
||||
"### Library import <a id=\"libraries\"/>\n",
|
||||
"\n",
|
||||
"Now that our prior resources are available, let's first import a few libraries we will need for the deployment on AKS."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -83,40 +72,27 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 4. Azure workspace <a id=\"workspace\"/>\n",
|
||||
"## 2. Model deployment on AKS <a id=\"deploy\"/>\n",
|
||||
"\n",
|
||||
"In the prior notebook, we retrieved an existing or created a new workspace, and generated an `./aml_config/config.json` file.\n",
|
||||
"Let's use it to load this workspace.\n",
|
||||
"### 2.A Workspace retrieval <a id=\"workspace\">\n",
|
||||
"\n",
|
||||
"<i><b>Note:</b> The Docker image we will use below is attached to the workspace we used in the prior notebook. It is then important to use the same workspace here. If, for any reason, we need to use a separate workspace here, then the steps followed to create a Docker image containing our image classifier model in the prior notebook, should be reproduced here.</i>"
|
||||
"Let's now load the workspace we used in the [prior notebook](21_deployment_on_azure_container_instances.ipynb).\n",
|
||||
"\n",
|
||||
"<i><b>Note:</b> The Docker image we will use below is attached to that workspace. It is then important to use the same workspace here. If, for any reason, we needed to use another workspace instead, we would need to reproduce, here, the steps followed to create a Docker image containing our image classifier model in the prior notebook.</i>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"# from_config() refers to this config.json file by default"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's check that the workspace is properly loaded"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.setup()\n",
|
||||
"# setup() refers to our config.json file by default\n",
|
||||
"\n",
|
||||
"# Print the workspace attributes\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Workspace region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
|
@ -125,16 +101,14 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 5. Model deployment on AKS <a id=\"deploy\">\n",
|
||||
"### 2.B Docker image retrieval <a id=\"docker_image\">\n",
|
||||
"\n",
|
||||
"### 5.A Docker image retrieval <a id=\"docker_image\">\n",
|
||||
"\n",
|
||||
"As for the deployment on Azure Container Instances, we will use Docker containers. The Docker image we created in the prior notebook is very much suitable for our deployment on Azure Kubernetes Service, as it contains the libraries we need and the model we registered. Let's make sure this Docker image is still available (if not, we can just run the cells of section \"6. Model deployment on Azure\" of the [prior notebook](https://github.com/Microsoft/ComputerVision/blob/staging/image_classification/notebooks/21_deployment_on_azure_container_instances.ipynb))."
|
||||
"We can reuse the Docker image we created in section 3. of the [previous tutorial](21_deployment_on_azure_container_instances.ipynb). Let's make sure that it is still available."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
|
@ -143,9 +117,9 @@
|
|||
"text": [
|
||||
"Docker images:\n",
|
||||
" --> Name: image-classif-resnet18-f48\n",
|
||||
" --> ID: image-classif-resnet18-f48:30\n",
|
||||
" --> ID: image-classif-resnet18-f48:31\n",
|
||||
" --> Tags: {'training set': 'ImageNet', 'architecture': 'CNN ResNet18', 'type': 'Pretrained'}\n",
|
||||
" --> Creation time: 2019-04-25 18:18:33.724424+00:00\n",
|
||||
" --> Creation time: 2019-05-09 01:31:05.323875+00:00\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
|
@ -169,7 +143,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -187,16 +161,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"registered_model = docker_image.models[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
|
@ -205,14 +170,16 @@
|
|||
"text": [
|
||||
"Existing model:\n",
|
||||
" --> Name: im_classif_resnet18\n",
|
||||
" --> Version: 76\n",
|
||||
" --> ID: im_classif_resnet18:76 \n",
|
||||
" --> Creation time: 2019-04-25 18:17:27.688750+00:00\n",
|
||||
" --> URL: aml://asset/ccf6f55b203a4fc69b0f0e18ec6f72a1\n"
|
||||
" --> Version: 79\n",
|
||||
" --> ID: im_classif_resnet18:79 \n",
|
||||
" --> Creation time: 2019-05-09 01:29:36.509947+00:00\n",
|
||||
" --> URL: aml://asset/ba1c698e15bb4f6ca56b020293456ed1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"registered_model = docker_image.models[0]\n",
|
||||
"\n",
|
||||
"print(f\"Existing model:\\n --> Name: {registered_model.name}\\n \\\n",
|
||||
"--> Version: {registered_model.version}\\n --> ID: {registered_model.id} \\n \\\n",
|
||||
"--> Creation time: {registered_model.created_time}\\n \\\n",
|
||||
|
@ -224,7 +191,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 5.B AKS compute target creation<a id=\"compute\"/>\n",
|
||||
"### 2.C AKS compute target creation<a id=\"compute\"/>\n",
|
||||
"\n",
|
||||
"In the case of deployment on AKS, in addition to the Docker image, we need to define computational resources. This is typically a cluster of CPUs or a cluster of GPUs. If we already have a Kubernetes-managed cluster in our workspace, we can use it, otherwise, we can create a new one.\n",
|
||||
"\n",
|
||||
|
@ -235,7 +202,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
|
@ -243,10 +210,8 @@
|
|||
"output_type": "stream",
|
||||
"text": [
|
||||
"List of compute resources associated with our workspace:\n",
|
||||
" --> imgclass-aks-gpu: <azureml.core.compute.aks.AksCompute object at 0x000001EF61DEB278>\n",
|
||||
" --> imgclass-aks-cpu: <azureml.core.compute.aks.AksCompute object at 0x000001EF61DEE0F0>\n",
|
||||
" --> cpucluster: <azureml.core.compute.amlcompute.AmlCompute object at 0x000001EF61DEE7B8>\n",
|
||||
" --> gpuclusternc12: <azureml.core.compute.amlcompute.AmlCompute object at 0x000001EF61DEEE10>\n"
|
||||
" --> cpucluster: <azureml.core.compute.amlcompute.AmlCompute object at 0x0000029614AB1B38>\n",
|
||||
" --> gpuclusternc12: <azureml.core.compute.amlcompute.AmlCompute object at 0x0000029614AB1748>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
@ -260,9 +225,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### 5.B.a Creation of a new AKS cluster\n",
|
||||
"\n",
|
||||
"In the case where we have no compute resource available, we can create a new one. For this, we can choose between a CPU-based or a GPU-based cluster of virtual machines. There is a [wide variety](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general) of machine types that can be used. In the present example, however, we will not need the fastest machines that exist nor the most memory optimized ones. We will use typical default machines:\n",
|
||||
"In the case where we have no compute resource available, we can create a new one. For this, we can choose between a CPU-based or a GPU-based cluster of virtual machines. The latter is typically better suited for web services with high traffic (i.e. > 100 requests per second) and high GPU utilization. There is a [wide variety](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general) of machine types that can be used. In the present example, however, we will not need the fastest machines that exist nor the most memory optimized ones. We will use typical default machines:\n",
|
||||
"- [Standard D3 V2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general#dv2-series):\n",
|
||||
" - 4 vCPUs\n",
|
||||
" - 14 GB of memory\n",
|
||||
|
@ -274,7 +237,7 @@
|
|||
"<i><b>Notes:</b></i>\n",
|
||||
"- These are Azure-specific denominations\n",
|
||||
"- Information on optimized machines can be found [here](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general#other-sizes)\n",
|
||||
"- When configuring the provisioning of an AKS cluster, we need to choose a type of machine, as examplified above. This choice must be such that the number of virtual machines (also called `agent nodes`), we require, multiplied by the number of vCPUs on each machine must be greater than or equal to 12 vCPUs. This is indeed the [minimum needed](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#create-a-new-cluster) for such cluster. By default, a pool of 3 virtual machines gets provisioned on a new AKS cluster to allow for redundancy. So, if the type of virtual machine we choose has a number of vCPUs (`vm_size`) smaller than 4, we need to increase the number of machines (`agent_count`) such that `agent_count x vm_size` ≥ `12` virtual CPUs. `agent_count` and `vm_size` are both parameters we can pass to the `provisioning_configuration()` method below.\n",
|
||||
"- When configuring the provisioning of an AKS cluster, we need to choose a type of machine, as examplified above. This choice must be such that the number of virtual machines (also called `agent nodes`), we require, multiplied by the number of vCPUs on each machine must be greater than or equal to 12 vCPUs. This is indeed the [minimum needed](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#create-a-new-aks-cluster) for such cluster. By default, a pool of 3 virtual machines gets provisioned on a new AKS cluster to allow for redundancy. So, if the type of virtual machine we choose has a number of vCPUs (`vm_size`) smaller than 4, we need to increase the number of machines (`agent_count`) such that `agent_count x vm_size` ≥ `12` virtual CPUs. `agent_count` and `vm_size` are both parameters we can pass to the `provisioning_configuration()` method below.\n",
|
||||
"- [This document](https://docs.microsoft.com/en-us/azure/templates/Microsoft.ContainerService/2019-02-01/managedClusters?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#managedclusteragentpoolprofile-object) provides the full list of virtual machine types that can be deployed in an AKS cluster\n",
|
||||
"- Additional considerations on deployments using GPUs are available [here](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#deployment-considerations)\n",
|
||||
"\n",
|
||||
|
@ -283,14 +246,16 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"We retrieved the imgclass-aks-cpu AKS compute target\n"
|
||||
"Creating..............................................................................................\n",
|
||||
"SucceededProvisioning operation finished, operation \"Succeeded\"\n",
|
||||
"We created the imgclass-aks-cpu AKS compute target\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
@ -341,27 +306,6 @@
|
|||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### 5.B.b Alternative: Attachment of an existing AKS cluster\n",
|
||||
"\n",
|
||||
"Within our overall subscription, we may already have created an AKS cluster. This cluster may not be visible when we run the `ws.compute_targets` command, though. This is because it is not attached to our present workspace. If we want to use that cluster instead, we need to attach it to our workspace, first. We can do this as follows:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"existing_aks_name = '<name_of_the_existing_detached_aks_cluster>'\n",
|
||||
"resource_id = '/subscriptions/<subscription_id/resourcegroups/<resource_group>/providers/Microsoft.ContainerService/managedClusters/<aks_cluster_full_name>'\n",
|
||||
"# <aks_cluster_full_name> can be found by clicking on the aks cluster, in the Azure portal, as the \"Resource ID\" string\n",
|
||||
"# <subscription_id> can be obtained through ws.subscription_id, and <resource_group> through ws.resource_group\n",
|
||||
"\n",
|
||||
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
|
||||
"aks_target = ComputeTarget.attach(workspace=ws, name=existing_aks_name, attach_configuration=attach_config)\n",
|
||||
"aks_target.wait_for_completion(show_output = True)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
@ -373,7 +317,7 @@
|
|||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
|
@ -395,14 +339,14 @@
|
|||
"source": [
|
||||
"The set of resources we will use to deploy our web service on AKS is now provisioned and available.\n",
|
||||
"\n",
|
||||
"### 5.C Monitoring activation <a id=\"monitor\"/>\n",
|
||||
"### 2.D Monitoring activation <a id=\"monitor\"/>\n",
|
||||
"\n",
|
||||
"Once our web app is up and running, it is very important to monitor it, and measure the amount of traffic it gets, how long it takes to respond, the type of exceptions that get raised, etc. We will do so through [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview), which is an application performance management service. To enable it on our soon-to-be-deployed web service, we first need to update our AKS configuration file:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
|
@ -414,16 +358,16 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 5.D Service deployment <a id=\"svc_deploy\"/>\n",
|
||||
"### 2.E Service deployment <a id=\"svc_deploy\"/>\n",
|
||||
"\n",
|
||||
"We are now ready to deploy our web service. As in the [first](https://github.com/Microsoft/ComputerVision/blob/staging/image_classification/notebooks/21_deployment_on_azure_container_instances.ipynb) notebook, we will deploy from the Docker image. It indeed contains our image classifier model and the conda environment needed for the scoring script to work properly. The parameters to pass to the `Webservice.deploy_from_image()` command are similar to those used for the deployment on ACI. The only major difference is the compute target (`aks_target`), i.e. the CPU cluster we just spun up.\n",
|
||||
"We are now ready to deploy our web service. As in the [first](21_deployment_on_azure_container_instances.ipynb) notebook, we will deploy from the Docker image. It indeed contains our image classifier model and the conda environment needed for the scoring script to work properly. The parameters to pass to the `Webservice.deploy_from_image()` command are similar to those used for the deployment on ACI. The only major difference is the compute target (`aks_target`), i.e. the CPU cluster we just spun up.\n",
|
||||
"\n",
|
||||
"<i><b>Note:</b> This deployment takes a few minutes to complete.</i>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
|
@ -431,7 +375,7 @@
|
|||
"output_type": "stream",
|
||||
"text": [
|
||||
"Creating service\n",
|
||||
"Running.........................\n",
|
||||
"Running............................\n",
|
||||
"SucceededAKS service creation operation finished, operation \"Succeeded\"\n",
|
||||
"The web service is Healthy\n"
|
||||
]
|
||||
|
@ -492,39 +436,25 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Our web service is up, and is running on AKS. We can now proceed to testing it."
|
||||
"Our web service is up, and is running on AKS."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 6. Testing of the web service <a id=\"testing\"/>\n",
|
||||
"\n",
|
||||
"Such testing is a whole task of its own, so we separated it from this notebook. We provide all the needed steps in [23_web_service_testing.ipynb](https://github.com/Microsoft/ComputerVision/blob/service_deploy/image_classification/notebooks/deployment/23_web_service_testing.ipynb). There, we test our service:\n",
|
||||
"- From within our workspace (using `aks_service.run()`)\n",
|
||||
"- From outside our workspace (using `requests.post()`)\n",
|
||||
"- From a Flask app running on our local machine\n",
|
||||
"- From a Flask app deployed on the same AKS cluster as our web service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 7. Clean up <a id=\"clean\">\n",
|
||||
"## 3. Clean up <a id=\"clean\">\n",
|
||||
" \n",
|
||||
"In a real-life scenario, it is likely that the service we created would need to be up and running at all times. However, in the present demonstrative case, and once we have verified that our service works, we can delete it as well as all the resources we used.\n",
|
||||
"In a real-life scenario, it is likely that the service we created would need to be up and running at all times. However, in the present demonstrative case, and once we have verified that our service works (cf. \"Next steps\" section below), we can delete it as well as all the resources we used.\n",
|
||||
"\n",
|
||||
"In this notebook, the only resource we added to our subscription, in comparison to what we had at the end of the notebook on ACI deployment, is the AKS cluster. There is no fee for cluster management. The only components we are paying for are:\n",
|
||||
"- the cluster nodes\n",
|
||||
"- the managed OS disks.\n",
|
||||
"\n",
|
||||
"Here, we used Standard D3 V2 machines, which come with a temporary storage of 200 GB. Over the course of this tutorial (assuming ~ 1 hour), this added less than $1 to our bill. Now, it is important to understand that each hour during which the cluster is up gets billed, whether the web service is called or not. The same is true for the ACI and workspace we have been using until now.\n",
|
||||
"Here, we used Standard D3 V2 machines, which come with a temporary storage of 200 GB. Over the course of this tutorial (assuming ~ 1 hour), this changed almost nothing to our bill. Now, it is important to understand that each hour during which the cluster is up gets billed, whether the web service is called or not. The same is true for the ACI and workspace we have been using until now.\n",
|
||||
"\n",
|
||||
"To get a better sense of pricing, we can refer to [this calculator](https://azure.microsoft.com/en-us/pricing/calculator/?service=kubernetes-service#kubernetes-service). We can also navigate to the [Cost Management + Billing pane](https://ms.portal.azure.com/#blade/Microsoft_Azure_Billing/ModernBillingMenuBlade/Overview) on the portal, click on our subscription ID, and click on the Cost Analysis tab to check our credit usage.\n",
|
||||
"\n",
|
||||
"### 7.A Monitoring deactivation and service deletion <a id=\"insights\"/>\n",
|
||||
"If we plan on no longer using this web service, we can turn monitoring off, and delete the compute target, the service itself as well as the associated Docker image."
|
||||
]
|
||||
},
|
||||
|
@ -554,7 +484,6 @@
|
|||
"source": [
|
||||
"At this point, all the service resources we used in this notebook have been deleted. We are only now paying for our workspace.\n",
|
||||
"\n",
|
||||
"### 7.B Workspace deletion <a id=\"del_workspace\"/>\n",
|
||||
"If our goal is to continue using our workspace, we should keep it available. On the contrary, if we plan on no longer using it and its associated resources, we can delete it.\n",
|
||||
"\n",
|
||||
"<i><b>Note:</b> Deleting the workspace will delete all the experiments, outputs, models, Docker images, deployments, etc. that we created in that workspace.</i>"
|
||||
|
@ -574,8 +503,8 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 8. Next steps <a id=\"next\"/>\n",
|
||||
"In the [next notebook](https://github.com/Microsoft/ComputerVision/blob/service_deploy/image_classification/notebooks/deployment/23_web_service_testing.ipynb), we will test the web services we deployed on ACI and on AKS. We will also learn how a Flask app, with an interactive user interface, can be used to call our web service."
|
||||
"## 4. Next steps <a id=\"next\"/>\n",
|
||||
"In the [next notebook](23_aci_aks_web_service_testing.ipynb), we will test the web services we deployed on ACI and on AKS."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
|
|
@ -23,8 +23,7 @@
|
|||
"1. [Service telemetry in Application Insights](#insights)\n",
|
||||
"1. [Clean up](#clean)\n",
|
||||
" 1. [Application Insights deactivation and web service termination](#del_app_insights)\n",
|
||||
" 1. [Docker image deletion](#del_image)\n",
|
||||
"1. [Next steps](#next-steps)"
|
||||
" 1. [Docker image deletion](#del_image)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -485,15 +484,6 @@
|
|||
"docker_image = ws.images[\"image-classif-resnet18-f48\"]\n",
|
||||
"# docker_image.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 6. Next steps <a id=\"next-steps\">\n",
|
||||
"\n",
|
||||
"In the next notebook, we will learn how to create a user interface that will allow our users to interact with our web service through the simple upload of images."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
|
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 164 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 46 KiB |
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 154 KiB |
|
@ -0,0 +1,333 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
# <i>Copyright (c) Microsoft Corporation. All rights reserved.</i>
|
||||
#
|
||||
# <i>Licensed under the MIT License.</i>
|
||||
|
||||
# # Multilabel Classification
|
||||
#
|
||||
# In this notebook, we will look at the best practices for doing multilabel classification.
|
||||
#
|
||||
# In the previous notebook, we performed multi-class/single-label classification, which assumes that each image is assigned to only one label: an animal can be either an dog or a cat but not both at the same time. Multi-label classification on the other hand, will assume that each image can contain or represent multiple different labels: a landscape can be labeled both gloomy (weather) and of a beach (subject).
|
||||
#
|
||||
# In this notebook, we'll train a multilabel classifier and examine how best to structure data for multilabel classification problems as well as learn about new ways to evaluate our results.
|
||||
|
||||
# In[1]:
|
||||
|
||||
|
||||
# Ensure edits to libraries are loaded and plotting is shown in the notebook.
|
||||
get_ipython().run_line_magic("reload_ext", "autoreload")
|
||||
get_ipython().run_line_magic("autoreload", "2")
|
||||
get_ipython().run_line_magic("matplotlib", "inline")
|
||||
|
||||
|
||||
# Import fastai and other libraries needed. For now, we'll import all (`import *`) so that we can easily use different utilies provided by the fastai library.
|
||||
|
||||
# In[2]:
|
||||
|
||||
|
||||
import sys
|
||||
|
||||
sys.path.append("../../")
|
||||
|
||||
import warnings
|
||||
|
||||
warnings.filterwarnings("ignore")
|
||||
|
||||
import inspect
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
|
||||
# fastai and torch
|
||||
import fastai
|
||||
from fastai.vision import *
|
||||
|
||||
# local modules
|
||||
from utils_cv.classification.model import (
|
||||
TrainMetricsRecorder,
|
||||
hamming_loss,
|
||||
zero_one_loss,
|
||||
)
|
||||
from utils_cv.classification.plot import (
|
||||
plot_pr_roc_curves,
|
||||
plot_loss_thresholds,
|
||||
)
|
||||
from utils_cv.classification.data import Urls
|
||||
from utils_cv.common.data import unzip_url
|
||||
from utils_cv.common.gpu import which_processor
|
||||
|
||||
print(f"Fast.ai version = {fastai.__version__}")
|
||||
which_processor()
|
||||
|
||||
|
||||
# Like before, we set some parameters. This time, we can use one of the multilabel datasets that comes with this repo.
|
||||
|
||||
# In[3]:
|
||||
|
||||
|
||||
DATA_PATH = unzip_url(Urls.multilabel_fridge_objects_path, exist_ok=True)
|
||||
EPOCHS = 10
|
||||
LEARNING_RATE = 1e-4
|
||||
IMAGE_SIZE = 299
|
||||
BATCH_SIZE = 16
|
||||
ARCHITECTURE = models.resnet50
|
||||
|
||||
|
||||
# ---
|
||||
|
||||
# ## 1. Preparing Image Data for Multilabel Classification
|
||||
#
|
||||
# In this notebook, we'll look at different kinds of beverages. In the repo, under `data`, we've downloaded a directory titled: __multilabelFridgeObjects__.
|
||||
#
|
||||
# Lets set that directory to our `path` variable, which we'll use throughout the notebook. We'll also inspect what's inside to get an understanding of how to structure images for multilabel classification.
|
||||
|
||||
# In[4]:
|
||||
|
||||
|
||||
path = Path(DATA_PATH)
|
||||
path.ls()
|
||||
|
||||
|
||||
# Lets inspect the `/images` folder:
|
||||
|
||||
# In[5]:
|
||||
|
||||
|
||||
(path / "images").ls()[:5]
|
||||
|
||||
|
||||
# Lets also take a look at the `labels.csv` file using pandas.
|
||||
|
||||
# In[6]:
|
||||
|
||||
|
||||
df = pd.read_csv(path / "labels.csv")
|
||||
df.sample(5)
|
||||
|
||||
|
||||
# As shown above, the contents of the csv file is a mapping of the filename to the labels. Since this is a multilabel classificaiton problem, each image can be associated to multiple labels.
|
||||
#
|
||||
# This is one of the most common data formast for multilabel image classification; one csv file that contains the mapping of labels to a folder of images:
|
||||
#
|
||||
# ```
|
||||
# /images
|
||||
# +-- labels.csv
|
||||
# +-- images
|
||||
# | +-- image1.jpg
|
||||
# | +-- image2.jpg
|
||||
# | +-- ...
|
||||
# | +-- image131.jpg
|
||||
# | +-- image132.jpg
|
||||
# ```
|
||||
|
||||
# ## 2. Load labels and images
|
||||
|
||||
# __Loading data__
|
||||
#
|
||||
# Now that we know the structure of our data, lets use fast.ai's data block apis to create our databunches so that we can easily load mini-batches of data from our filesystem into our trainer.
|
||||
|
||||
# In[7]:
|
||||
|
||||
|
||||
np.random.seed(42)
|
||||
data = (
|
||||
ImageList.from_csv(path, "labels.csv", folder="images")
|
||||
.random_split_by_pct(0.2)
|
||||
.label_from_df(label_delim=" ")
|
||||
.transform(size=299)
|
||||
.databunch(bs=32)
|
||||
.normalize(imagenet_stats)
|
||||
)
|
||||
|
||||
|
||||
# Lets break down the code:
|
||||
#
|
||||
# The first thing we need to do is to create an `ImageList`, and we'll do so by creating it from a csv file (`from_csv`). Then we want to do a random split (`random_split_by_pct`) so that we have our validation set. For this method, we've also set a random seed (`np.random.seed(42)`) so that our validation set is consistent. Finally we want to get our labels from the df (`label_from_df`) that comes from the csv file. Since our labels are space-seperated in the csv file, we want to specify that our labels will be delimited by a space (`label_delim=' '`).
|
||||
#
|
||||
# In the second part, we use the `ImageList` we created and apply a transformation on it (`transform`) so that all images are resized to 299X299. Then we turn it into a databunch, which is basically the kind of object fastai's trainer uses to load mini-batches of data. Finally we'll normalize the databunch (`normalize(imagenet_states)` to the imagenet parameters.
|
||||
|
||||
# __Inpsect data__
|
||||
#
|
||||
# To make sure our data is correctly loaded, lets print out the number of classes, and each of the class labels.
|
||||
|
||||
# In[8]:
|
||||
|
||||
|
||||
print(f"number of classes: {data.c}")
|
||||
print(data.classes)
|
||||
|
||||
|
||||
# We can also call `batch_stats` on our databunch object to get a view on how the data is split between training and validation.
|
||||
|
||||
# In[9]:
|
||||
|
||||
|
||||
data.batch_stats
|
||||
|
||||
|
||||
# Lets get a sample of what the data looks like.
|
||||
|
||||
# In[10]:
|
||||
|
||||
|
||||
data.show_batch(rows=3, figsize=(15, 11))
|
||||
|
||||
|
||||
# # 3. Training our multilabel classifier
|
||||
#
|
||||
# One of the main differences between training a multilabel classifier an a single-label classifier is how we may want to evaluate our model. In a single-label (multi-class) classification model, we often use a model's accuracy to see how well a model performs. But _accuracy_ as an evaluation metric isn't specific enough when it comes to multilabel classification problems.
|
||||
#
|
||||
# __The Problem With Accuracy__
|
||||
#
|
||||
# For multilabel classification problems, a misclassification is not binary: right or wrong. Instead a prediction containing a subset of the correct labels we're looking for is better than one that contains none of them. For example, in an image that is labelled both 'rainy' and 'forest', it is usually better to predict one correct label than neither of the correct labels.
|
||||
#
|
||||
# One of the other problems when it comes to calculating accuracy is that the softmax activation function does not work well for multilabel classification problems. In single-label classification, we usually use a softmax function on the output of our neural network because we want to express a dependency across the labels; if the picture is likely of a _dog_, then it is unlikely of a _cat_. By applying a softmax on the output, we force the sum of the values to 1, enforcing this dependency.
|
||||
#
|
||||
# For multilabel classification, label likelihoods are independent from each other; the likelihood of an image being _rainy_ is independent from the likelihood of it being a _forest_. Instead of the softmax function, we can use the sigmoid activation function to normalize our result while preserving the independent relationship of each label.
|
||||
#
|
||||
#
|
||||
# __Hamming Loss__
|
||||
#
|
||||
# One of the most common ways to evaluate a multilabel classification problem is by using the hamming loss, which we can think of as the fraction of wrong labels to the total number of labels.
|
||||
#
|
||||
# For example, lets say our validation set contains 4 images and the results looks as such:
|
||||
# ```
|
||||
# +-------+------------------+------------------+------------------+
|
||||
# | Image | y_true: | y_pred: | hamming_loss: |
|
||||
# |-------+------------------+------------------+------------------+
|
||||
# | im_01 | [[1, 0, 0, 1], | [[1, 0, 0, 0], | [[0, 0, 0, 1], |
|
||||
# | im_02 | [1, 0, 1, 1], | [1, 1, 1, 1], | [0, 1, 0, 0], |
|
||||
# | im_03 | [0, 1, 0, 0], | [0, 1, 0, 0], | [0, 0, 0, 0], |
|
||||
# | im_04 | [1, 1, 0, 0]] | [1, 1, 1, 0]] | [0, 0, 1, 0]] |
|
||||
# +-------+------------------+------------------+------------------+
|
||||
# | | = 3/25 incorrect |
|
||||
# +-------+------------------+------------------+------------------+
|
||||
# ```
|
||||
# In this case, the predictions has 3 out of a total of 16 predictions that are not true, so the hamming loss is __0.1875__.
|
||||
#
|
||||
# __Zero-one Loss__
|
||||
#
|
||||
# Zero-one loss is a much harsher evaluation metric than hamming loss. The zero-one loss will classify an entire set of labels for a given sample incorrect if it does not entirely match the true set of labels. Hamming loss is more forgiving since it penalizes only the individual labels themselves.
|
||||
#
|
||||
# Once again, lets say our validation set contains 4 images and the results looks as such:
|
||||
# ```
|
||||
# +-------+------------------+------------------+------------------+
|
||||
# | Image | y_true: | y_pred: | zero_one_loss: |
|
||||
# |-------+------------------+------------------+------------------+
|
||||
# | im_01 | [[1, 0, 0, 1], | [[1, 0, 0, 0], | [[1], |
|
||||
# | im_02 | [1, 0, 1, 1], | [1, 1, 1, 1], | [1], |
|
||||
# | im_03 | [0, 1, 0, 0], | [0, 1, 0, 0], | [0], |
|
||||
# | im_04 | [1, 1, 0, 0]] | [1, 1, 1, 0]] | [1]] |
|
||||
# +-------+------------------+------------------+------------------+
|
||||
# | | = 3/4 incorrect |
|
||||
# +-------+------------------+------------------+------------------+
|
||||
# ```
|
||||
# In this case, the predictions have only classified 3 individual labels incorrectly. But since we're using zero-one loss, and each of those misclassifications are in a different set, we end up with a zero-one loss of __0.75__. If we compare this to hamming loss, we can see that it is a much less forgiving metric.
|
||||
#
|
||||
# While hamming loss and zero-one loss are a common evaluation metric for multilabel classification, note that it may not be ideal for all multilabel classification problems. For each problem, you need to access what you're evaluating your model against to see if it is a good fit.
|
||||
|
||||
# ---
|
||||
|
||||
#
|
||||
# If we want to take advantage of using Hamming Loss, we'll need to define our own evaluation metric. To do this, we'll need to create a custom function that will takes a `y_pred` and a `y_true`, and returns a single metric.
|
||||
#
|
||||
# > Since we've defined our hamming loss and zero-one loss functions in the `utils_cv.classification.models` module, lets just print out them out to see what they looks like.
|
||||
#
|
||||
|
||||
# In[11]:
|
||||
|
||||
|
||||
print(inspect.getsource(hamming_loss))
|
||||
|
||||
|
||||
# In[12]:
|
||||
|
||||
|
||||
print(inspect.getsource(zero_one_loss))
|
||||
|
||||
|
||||
# We'll use the `create_cnn` function to create our CNN, passing in our custom `hamming_loss` function.
|
||||
|
||||
# In[13]:
|
||||
|
||||
|
||||
learn = cnn_learner(
|
||||
data,
|
||||
ARCHITECTURE,
|
||||
metrics=[hamming_loss, zero_one_loss],
|
||||
callback_fns=[partial(TrainMetricsRecorder, show_graph=True)],
|
||||
)
|
||||
|
||||
|
||||
# Unfreeze our CNN since we're training all the layers.
|
||||
|
||||
# In[14]:
|
||||
|
||||
|
||||
learn.unfreeze()
|
||||
|
||||
|
||||
# We can call the `fit` function to train the dnn.
|
||||
|
||||
# In[15]:
|
||||
|
||||
|
||||
learn.fit(EPOCHS, LEARNING_RATE)
|
||||
|
||||
|
||||
# In[16]:
|
||||
|
||||
|
||||
learn.recorder.plot_losses()
|
||||
|
||||
|
||||
# ## 4. Evaluate the model
|
||||
|
||||
# The learner comes with a handy function `show_results` that will show one mini-batch of the validation set. We can use that to get an intuitive sense of what is being predicted correctly and what is not.
|
||||
|
||||
# In[17]:
|
||||
|
||||
|
||||
learn.show_results(rows=3, figsize=(15, 10))
|
||||
|
||||
|
||||
# To concretely evaluate our model, lets take a look at the hamming loss on the validation set. We can think of this value as the percentage of the total incorrect classifications out of the total possible classifications.
|
||||
|
||||
# In[18]:
|
||||
|
||||
|
||||
_, hl, zol = learn.validate(
|
||||
learn.data.valid_dl, metrics=[hamming_loss, zero_one_loss]
|
||||
)
|
||||
print(f"Hamming Loss on validation set: {float(hl):3.2f}")
|
||||
print(f"Zero-one Loss on validation set: {float(zol):3.2f}")
|
||||
|
||||
|
||||
# We've calculated the hamming loss on our validation set with the default probability threshold of 0.2. However, this default value may not be the most optimal value. We can use the `plot_loss_thresholds` function to plot the evaluation metric at different levels of thresholds. If, for example, we were interested in the zero-one loss, but we noticed that the default threshold is far from the minimum, we may consider using a different threshold when we perform inferencing. Lets plot the zero-one loss at various thresholds to what the most optimal threshold is.
|
||||
#
|
||||
# Note that the threshold represents a trade-off between specificity and sensitivity. The higher the threshold, the higher the _specificity_. The lower the threshold, the higher the _sensivity_.
|
||||
|
||||
# In[19]:
|
||||
|
||||
|
||||
interp = learn.interpret()
|
||||
plot_loss_thresholds(zero_one_loss, interp.probs, interp.y_true)
|
||||
|
||||
|
||||
# We can clearly see that the default threshold value of 0.2 is not the mininum. Lets move the threshold to achieve a better loss.
|
||||
|
||||
# In[20]:
|
||||
|
||||
|
||||
zero_one_loss(interp.probs, interp.y_true, threshold=0.3)
|
||||
|
||||
|
||||
# Other than looking at zero-one loss and hamming loss, we can also plot the recision-recall and ROC curves for each class.
|
||||
|
||||
# In[21]:
|
||||
|
||||
|
||||
# True labels of the validation set. We convert to numpy array for plotting.
|
||||
plot_pr_roc_curves(to_np(interp.y_true), to_np(interp.probs), data.classes)
|
|
@ -0,0 +1,332 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
# <i>Copyright (c) Microsoft Corporation. All rights reserved.</i>
|
||||
#
|
||||
# <i>Licensed under the MIT License.</i>
|
||||
|
||||
# # Hard Negative Sampling for Image Classification
|
||||
|
||||
# You built an image classification model, evaluated it on a validation set and got a decent accuracy. Now you deploy the model for the real-world scenario. And soon, you may find that the model performs worse than expected.
|
||||
#
|
||||
# This is quite common scenario (and inevitable) when we build a machine learning model because we cannot collect all the possible samples. Your model is supposed to learn the features that describe the target classes the best, but in reality, it learns the best features to classify your dataset. For example, if we have photos of *butterfly* on a flower, the model may learn flower shapes to classify *butterfly*.
|
||||
#
|
||||
# <img src="./media/hard_neg_ex1.jpg" width="300"> | <img src="./media/hard_neg_ex2.jpg" width="300">
|
||||
# ---|---
|
||||
# Did our model learn a butterfly? | or yellow flowers?
|
||||
#
|
||||
# Hard negative sampling (or hard negative mining) is a useful technique to address this pitfall. It is a way to explicitly create examples for your training set from falsely classified samples. The technique is widely used when you cannot add all the negative samples since (i) training time would get too slow because of too many training samples; and (ii) many of the negative images are trivial for the model and hence the model would not learn anything. Therefore, we try to identify the images which make a difference when added to the training set.
|
||||
#
|
||||
# In this notebook, we train our model on a training set as usual, test the model on un-seen negative examples and see if the model classifies them correctly. If not, we introduce those samples into the training set and re-train the model on them.
|
||||
#
|
||||
# # Overview
|
||||
#
|
||||
# Our goal is to train a classifier which can recognize *fridge obejcts* (`watter_bottle`, `carton`, `can`, and `milk_bottle`), similar to [01_train notebook](./01_training_introduction.ipynb). However, the input image might not even contain any of these objects in the real use-case. Therefore, we also introduce `negative` class.
|
||||
#
|
||||
# <img src="./media/hard_neg.jpg" width="600"/>
|
||||
#
|
||||
# The overall training process is as follows:
|
||||
# * First, prepare training set <i>T</i> and negative-sample set <i>U</i>. <i>T</i> may include initial negative samples
|
||||
# * Next, load a pre-trained ImageNet model
|
||||
# * And then, mine hard negative samples by following steps as shown in the figure:
|
||||
# 1. Train the model on <i>T</i>
|
||||
# 2. Score the model on <i>U</i>
|
||||
# 3. Identify hard images the model mis-classified, annotate them and add to <i>T</i> so that the model can learn the patterns it confused before.
|
||||
# * Finally, repeat these steps until we get a good accuracy.
|
||||
|
||||
# In[1]:
|
||||
|
||||
|
||||
# Ensure edits to libraries are loaded and plotting is shown in the notebook.
|
||||
get_ipython().run_line_magic('reload_ext', 'autoreload')
|
||||
get_ipython().run_line_magic('autoreload', '2')
|
||||
get_ipython().run_line_magic('matplotlib', 'inline')
|
||||
|
||||
|
||||
# In[2]:
|
||||
|
||||
|
||||
from functools import partial
|
||||
import os
|
||||
from pathlib import Path
|
||||
import sys
|
||||
sys.path.append("../../")
|
||||
import shutil
|
||||
from tempfile import TemporaryDirectory
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
# fastai
|
||||
import fastai
|
||||
from fastai.metrics import accuracy
|
||||
from fastai.vision import (
|
||||
# data-modules
|
||||
CategoryList, DatasetType, get_image_files, ImageList, imagenet_stats,
|
||||
# model-modules
|
||||
cnn_learner, models, ClassificationInterpretation,
|
||||
)
|
||||
|
||||
from utils_cv.classification.model import (
|
||||
IMAGENET_IM_SIZE as IMAGE_SIZE,
|
||||
TrainMetricsRecorder,
|
||||
get_preds,
|
||||
)
|
||||
from utils_cv.classification.plot import plot_pr_roc_curves
|
||||
from utils_cv.classification.widget import ResultsWidget
|
||||
from utils_cv.classification.data import Urls
|
||||
from utils_cv.common.data import unzip_url
|
||||
from utils_cv.common.gpu import which_processor
|
||||
from utils_cv.common.misc import copy_files, set_random_seed
|
||||
from utils_cv.common.plot import line_graph, show_ims
|
||||
|
||||
print(f"Fast.ai version = {fastai.__version__}")
|
||||
which_processor()
|
||||
|
||||
|
||||
# In[3]:
|
||||
|
||||
|
||||
DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)
|
||||
NEGATIVE_NUM = 10 # Number of negative samples to add for each iteration of negative mining
|
||||
EPOCHS_HEAD = 4
|
||||
EPOCHS_BODY = 12
|
||||
LEARNING_RATE = 1e-4
|
||||
BATCH_SIZE = 16
|
||||
# Using fast_inference parameters from 02_training_accuracy_vs_speed notebook.
|
||||
ARCHITECTURE = models.resnet18
|
||||
IM_SIZE = 300
|
||||
|
||||
# Temporary folder to store datasets for hard-negative mining
|
||||
NEGATIVE_MINING_DATA_DIR = TemporaryDirectory().name
|
||||
|
||||
|
||||
# ## 1. Prepare datasets
|
||||
|
||||
# We prepare our dataset in the following way:
|
||||
# * The initial training set `T` to include *fridge objects* as well as some initial *negative samples*<sup>+</sup>.
|
||||
# * Negative image set `U`.
|
||||
# * Validation set `V` to have both *fridge objects* and *negative samples*. We evaluate our model on this set.
|
||||
#
|
||||
# <sub>+ We added `NEGATIVE_NUM` of negative samples to our initial training set. In a real use-case, you may want to include 100 or more images of negative samples.</sub>
|
||||
|
||||
# In[4]:
|
||||
|
||||
|
||||
ori_datapath = Path(DATA_PATH)
|
||||
neg_datapath = Path(unzip_url(Urls.fridge_objects_negatives_path, exist_ok=True))
|
||||
# We split positive samples into 80% training and 20% validation
|
||||
data_imlist = (ImageList.from_folder(ori_datapath)
|
||||
.split_by_rand_pct(valid_pct=0.2, seed=10)
|
||||
.label_from_folder())
|
||||
# We use 80% of negative images for hard-negative mining (set U) while 20% for validation
|
||||
neg_data = (ImageList.from_folder(neg_datapath)
|
||||
.split_by_rand_pct(valid_pct=0.2, seed=10)
|
||||
.label_const() # We don't use labels for negative data
|
||||
.transform(size=IMAGE_SIZE)
|
||||
.databunch(bs=BATCH_SIZE)
|
||||
.normalize(imagenet_stats))
|
||||
# Do not shuffle U when we predict
|
||||
neg_data.train_dl = neg_data.train_dl.new(shuffle=False)
|
||||
neg_data
|
||||
|
||||
|
||||
# In[5]:
|
||||
|
||||
|
||||
datapath = Path(NEGATIVE_MINING_DATA_DIR)/'data'
|
||||
|
||||
# Training set T
|
||||
copy_files(data_imlist.train.items, datapath/'train', infer_subdir=True)
|
||||
# We include first NEGATIVE_NUM negative images in U (neg_data.train_ds) to our initial training set T
|
||||
copy_files(neg_data.train_ds.items[:NEGATIVE_NUM], datapath/'train'/'negative')
|
||||
|
||||
# Validation set V
|
||||
copy_files(data_imlist.valid.items, datapath/'valid', infer_subdir=True)
|
||||
copy_files(neg_data.valid_ds.items, datapath/'valid'/'negative')
|
||||
|
||||
|
||||
# In[6]:
|
||||
|
||||
|
||||
set_random_seed(10)
|
||||
|
||||
|
||||
# In[7]:
|
||||
|
||||
|
||||
data = (ImageList.from_folder(datapath)
|
||||
.split_by_folder()
|
||||
.label_from_folder()
|
||||
.transform(size=IMAGE_SIZE)
|
||||
.databunch(bs=BATCH_SIZE)
|
||||
.normalize(imagenet_stats))
|
||||
data.show_batch()
|
||||
|
||||
|
||||
# In[8]:
|
||||
|
||||
|
||||
print(f'number of classes: {data.c} = {data.classes}')
|
||||
print(data.batch_stats)
|
||||
|
||||
|
||||
# ## 2. Prepare a model
|
||||
#
|
||||
# We use *fast inference* setup we demonstrated from [02_training_accuracy_vs_speed notebook](./02_training_accuracy_vs_speed.ipynb). The model is Resnet18 and pre-trained on [ImageNet](http://www.image-net.org/). Regarding the details about training concept, please see [01_training notebook](./01_training_introduction.ipynb).
|
||||
|
||||
# In[9]:
|
||||
|
||||
|
||||
learn = cnn_learner(data, ARCHITECTURE, metrics=accuracy)
|
||||
|
||||
|
||||
# In[10]:
|
||||
|
||||
|
||||
learn.fit_one_cycle(EPOCHS_HEAD, LEARNING_RATE)
|
||||
|
||||
|
||||
# In[11]:
|
||||
|
||||
|
||||
# Records train and valid accuracies by using Callback TrainMetricsRecorder
|
||||
learn.callbacks.append(TrainMetricsRecorder(learn, show_graph=True))
|
||||
learn.unfreeze()
|
||||
|
||||
|
||||
# In[12]:
|
||||
|
||||
|
||||
# We record train and valid accuracies for later analysis
|
||||
train_acc = []
|
||||
valid_acc = []
|
||||
interpretations = []
|
||||
|
||||
|
||||
# ## 3. Train the model on *T*
|
||||
#
|
||||
# <a id='train'></a>
|
||||
#
|
||||
# From this section to the end, we do training and negative mining. As described in the Overview section, You may need to do repeat the negative mining steps several times to achieve good results.
|
||||
|
||||
# In[48]:
|
||||
|
||||
|
||||
# Show the number of repetitions you went through the negative mining
|
||||
print(f"Ran {len(interpretations)} time(s)")
|
||||
|
||||
|
||||
# In[49]:
|
||||
|
||||
|
||||
learn.fit_one_cycle(EPOCHS_BODY, LEARNING_RATE)
|
||||
|
||||
|
||||
# The following cell shows confusion matrix for the validation set. If you are repeating the negative mining steps, you will see all the confusion matrices from the repetitions.
|
||||
|
||||
# In[50]:
|
||||
|
||||
|
||||
interpretations.append(ClassificationInterpretation.from_learner(learn))
|
||||
|
||||
|
||||
# In[51]:
|
||||
|
||||
|
||||
for i, interp in enumerate(interpretations):
|
||||
interp.plot_confusion_matrix()
|
||||
|
||||
|
||||
# In[52]:
|
||||
|
||||
|
||||
# Store train and valid accuracy
|
||||
train_acc.extend(np.array(learn.train_metrics_recorder.train_metrics)[:, 0])
|
||||
valid_acc.extend(np.array(learn.train_metrics_recorder.valid_metrics)[:, 0])
|
||||
|
||||
|
||||
# In[53]:
|
||||
|
||||
|
||||
line_graph(
|
||||
values=(train_acc, valid_acc),
|
||||
labels=("Train", "Valid"),
|
||||
x_guides=[i*EPOCHS_BODY for i in range(1, len(train_acc)//EPOCHS_BODY + 1)],
|
||||
x_name="Epoch",
|
||||
y_name="Accuracy",
|
||||
)
|
||||
|
||||
|
||||
# **If the model performs well enough, we can stop the training / negative sampling here.**
|
||||
#
|
||||
# If not, let's do hard negative sampling.
|
||||
|
||||
# ## 4. Score the model on *U*
|
||||
|
||||
# In[42]:
|
||||
|
||||
|
||||
pred_outs = np.array(get_preds(learn, neg_data.train_dl)[0].tolist())
|
||||
print(f"Prediction results:\n{pred_outs[:10]}\n...")
|
||||
|
||||
|
||||
# ## 5. Hard negative mining
|
||||
|
||||
# In[43]:
|
||||
|
||||
|
||||
# Get top-n false classified images (by confidence)
|
||||
preds = np.argmax(pred_outs, axis=1)
|
||||
wrong_ids = np.where(preds!=data.classes.index('negative'))[0]
|
||||
wrong_ids_confs = [(i, pred_outs[i][preds[i]]) for i in wrong_ids]
|
||||
wrong_ids_confs = sorted(wrong_ids_confs, key=lambda l:l[1], reverse=True)[:NEGATIVE_NUM]
|
||||
|
||||
|
||||
# In[44]:
|
||||
|
||||
|
||||
negative_sample_ids = [w[0] for w in wrong_ids_confs]
|
||||
negative_sample_labels = [f"Pred: {data.classes[preds[w[0]]]}\nConf: {w[1]:.3f}" for w in wrong_ids_confs]
|
||||
show_ims(neg_data.train_ds.items[negative_sample_ids], negative_sample_labels, rows=NEGATIVE_NUM//5)
|
||||
|
||||
|
||||
# ## 6. Add hard negative samples to the training set *T*
|
||||
#
|
||||
# We add the hard negative samples into the training set.
|
||||
|
||||
# In[45]:
|
||||
|
||||
|
||||
copy_files(neg_data.train_ds.items[negative_sample_ids], datapath/'train'/'negative')
|
||||
|
||||
|
||||
# In[47]:
|
||||
|
||||
|
||||
# Reload the dataset which includes more negative-samples
|
||||
data = (ImageList.from_folder(datapath)
|
||||
.split_by_folder()
|
||||
.label_from_folder()
|
||||
.transform(size=IMAGE_SIZE)
|
||||
.databunch(bs=BATCH_SIZE)
|
||||
.normalize(imagenet_stats))
|
||||
print(data.batch_stats)
|
||||
|
||||
# Set the dataset to the learner
|
||||
learn.data = data
|
||||
|
||||
|
||||
# Now, let's go **back** to the "[3. Train the model on T](#train)" and repeat the training and negative mining steps while we have a decent accuracy on `negative` samples
|
||||
|
||||
# In[54]:
|
||||
|
||||
|
||||
# Finally, show the number of repetitions you went through the negative mining
|
||||
print(f"Ran {len(interpretations)} time(s)")
|
||||
|
||||
|
||||
# In[ ]:
|
||||
|
||||
|
||||
|
||||
|
|
@ -5,60 +5,36 @@
|
|||
#
|
||||
# <i>Licensed under the MIT License.</i>
|
||||
#
|
||||
# # Deployment of a model as a service with Azure Container Instances
|
||||
# # Deployment of a model to an Azure Container Instance (ACI)
|
||||
|
||||
# ## Table of contents <a id="table_of_content"></a>
|
||||
#
|
||||
# 1. [Introduction](#intro)
|
||||
# 1. [Pre-requisites](#pre-reqs)
|
||||
# 1. [Library import](#libraries)
|
||||
# 1. [Azure workspace](#workspace)
|
||||
# 1. [SDK version](#sdk)
|
||||
# 1. [Workspace creation](#ws)
|
||||
# 1. [Model retrieval and export](#model)
|
||||
# 1. [Model deployment on Azure](#deploy)
|
||||
# 1. [Workspace retrieval](#workspace)
|
||||
# 1. [Model registration](#register)
|
||||
# 1. [Without experiment](#noexp)
|
||||
# 1. [With an experiment](#exp)
|
||||
# 1. [Scoring script](#scoring)
|
||||
# 1. [Environment setup](#env)
|
||||
# 1. [Computational resources](#compute)
|
||||
# 1. [Web service deployment](#websvc)
|
||||
# 1. [Testing of the web service](#test)
|
||||
# 1. [Using the run API](#api)
|
||||
# 1. [Via a raw HTTP request](#http)
|
||||
# 1. [Notes on web service deployment](#notes)
|
||||
# 1. [Notes on web service deployment](#notes)
|
||||
# 1. [Clean-up](#clean)
|
||||
# 1. [Service termination](#svcterm)
|
||||
# 1. [Image deletion](#imdel)
|
||||
# 1. [Workspace deletion](#wsdel)
|
||||
# 1. [Next steps](#next-steps)
|
||||
|
||||
# ## 1. Introduction <a id="intro"></a>
|
||||
#
|
||||
# Building a machine learning model with high precision and/or recall is very satisfying. However, it is not necessarily the end of the story. This model may need to go into production to be called in real time, and serve results to our end users. How do we go about doing that? In this notebook, we will learn:
|
||||
# - how to register a model on Azure
|
||||
# - how to create a Docker image that contains our model
|
||||
# - how to deploy a web service on [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/) using this Docker image
|
||||
# - how to test that our service works well, from within the notebook.
|
||||
# While building a good performing model is important, for it to be useful, it needs to be accessible. In this notebook, we will learn how to make this possible by deploying our model onto Azure. We will more particularly see how to:
|
||||
# - Register a model there
|
||||
# - Create a Docker image that contains our model
|
||||
# - Deploy a web service on [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/) using this Docker image.
|
||||
#
|
||||
# <img src="media/ACI_diagram_2.jpg" width="500" style="float: left;" alt="Web service deployment workflow">
|
||||
|
||||
# ## 2. Pre-requisites
|
||||
# <a id="pre-reqs"></a>
|
||||
# ### Pre-requisites <a id="pre-reqs"></a>
|
||||
# For this notebook to run properly on our machine, an Azure workspace is required. If we don't have one, we need to first run through the short [20_azure_workspace_setup.ipynb](20_azure_workspace_setup.ipynb) notebook to create it.
|
||||
#
|
||||
# For this notebook to run properly on our machine, the following should already be in place:
|
||||
#
|
||||
# * Local machine setup
|
||||
# * We need to set up the "cvbp" conda environment. [These instructions](https://github.com/Microsoft/ComputerVision/blob/master/classification/README.md#getting-started) explain how to do that.
|
||||
#
|
||||
#
|
||||
# * Azure subscription setup
|
||||
# * We also need an account on the Azure platform. If we do not have one, we first need to:
|
||||
# * [Create an account](https://azure.microsoft.com/en-us/free/services/machine-learning/)
|
||||
# * [Create a resource group and a workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace#portal)
|
||||
|
||||
# ## 3. Library import <a id="libraries"></a>
|
||||
# ### Library import <a id="libraries"></a>
|
||||
# Throughout this notebook, we will be using a variety of libraries. We are listing them here for better readibility.
|
||||
|
||||
# In[1]:
|
||||
|
@ -70,7 +46,6 @@ get_ipython().run_line_magic("autoreload", "2")
|
|||
|
||||
# Regular python libraries
|
||||
import os
|
||||
import requests
|
||||
import sys
|
||||
|
||||
# fast.ai
|
||||
|
@ -83,97 +58,26 @@ from azureml.core import Experiment, Workspace
|
|||
from azureml.core.image import ContainerImage
|
||||
from azureml.core.model import Model
|
||||
from azureml.core.webservice import AciWebservice, Webservice
|
||||
from azureml.exceptions import ProjectSystemException, UserErrorException
|
||||
|
||||
# Computer Vision repository
|
||||
sys.path.extend([".", "../.."])
|
||||
# This "sys.path.extend()" statement allows us to move up the directory hierarchy
|
||||
# and access the utils_ic and utils_cv packages
|
||||
# and access the utils_cv package
|
||||
from utils_cv.common.deployment import generate_yaml
|
||||
from utils_cv.common.data import data_path, root_path
|
||||
from utils_cv.common.image import ims2strlist
|
||||
from utils_cv.common.data import root_path
|
||||
from utils_cv.classification.model import IMAGENET_IM_SIZE, model_to_learner
|
||||
|
||||
|
||||
# ## 4. Azure workspace <a id="workspace"></a>
|
||||
|
||||
# ### 4.A SDK version <a id="sdk"></a>
|
||||
#
|
||||
# Before we start, let's check which version of the Azure SDK we are working with.
|
||||
|
||||
# In[2]:
|
||||
|
||||
|
||||
# Check core SDK version number
|
||||
print(f"Azure ML SDK Version: {azureml.core.VERSION}")
|
||||
|
||||
|
||||
# ### 4.B Workspace creation <a id="ws"></a>
|
||||
# Now that we have our environment and proper libraries in place, let's load an existing workspace or create a new one on our Azure account, and save it to a local configuration file (`./aml_config/config.json`).
|
||||
#
|
||||
# If it is the first time we create a workspace, or if we are missing our `config.json` file, we need to provide the appropriate:
|
||||
# - subscription ID: the ID of the Azure subscription we are using
|
||||
# - resource group: the name of the resource group in which our workspace resides
|
||||
# - workspace_region: the geographical area in which our workspace resides (examples are available [here](https://azure.microsoft.com/en-us/global-infrastructure/geographies/))
|
||||
# - workspace_name: the name of the workspace we want to create or retrieve.
|
||||
|
||||
# In[3]:
|
||||
|
||||
|
||||
# Let's define these variables here - These pieces of information can be found on the portal
|
||||
subscription_id = os.getenv("SUBSCRIPTION_ID", default="<our_subscription_id>")
|
||||
resource_group = os.getenv("RESOURCE_GROUP", default="<our_resource_group>")
|
||||
workspace_name = os.getenv("WORKSPACE_NAME", default="<our_workspace_name>")
|
||||
workspace_region = os.getenv(
|
||||
"WORKSPACE_REGION", default="<our_workspace_region>"
|
||||
)
|
||||
|
||||
try:
|
||||
# Let's load the workspace from the configuration file
|
||||
ws = Workspace.from_config()
|
||||
print("Workspace was loaded successfully from the configuration file")
|
||||
except (UserErrorException, ProjectSystemException):
|
||||
# or directly from Azure, if it already exists (exist_ok=True).
|
||||
# If it does not exist, let's create a workspace from scratch
|
||||
ws = Workspace.create(
|
||||
name=workspace_name,
|
||||
subscription_id=subscription_id,
|
||||
resource_group=resource_group,
|
||||
location=workspace_region,
|
||||
create_resource_group=True,
|
||||
exist_ok=True,
|
||||
)
|
||||
ws.write_config()
|
||||
print("Workspace was loaded successfully from Azure")
|
||||
|
||||
|
||||
# Let's check that the workspace is properly loaded
|
||||
|
||||
# In[4]:
|
||||
|
||||
|
||||
# Print the workspace attributes
|
||||
print(
|
||||
f"Workspace name: {ws.name}\n Azure region: {ws.location}\n Subscription id: {ws.subscription_id}\n Resource group: {ws.resource_group}"
|
||||
)
|
||||
|
||||
|
||||
# We can see this workspace on the Azure portal by sequentially clicking on:
|
||||
# - Resource groups, and clicking the one we referenced above
|
||||
|
||||
# <img src="media/resource_group.jpg" width="800" alt="Azure portal view of resource group">
|
||||
|
||||
# - Workspace_name
|
||||
|
||||
# <img src="media/workspace.jpg" width="800" alt="Azure portal view of workspace">
|
||||
|
||||
# ## 5. Model retrieval and export <a id="model"></a>
|
||||
# ## 2. Model retrieval and export <a id="model"></a>
|
||||
#
|
||||
# For demonstration purposes, we will use here a ResNet18 model, pretrained on ImageNet. The following steps would be the same if we had trained a model locally (cf. [**01_training_introduction.ipynb**](01_training_introduction.ipynb) notebook for details).
|
||||
#
|
||||
# Let's first retrieve the model.
|
||||
|
||||
# In[5]:
|
||||
# In[2]:
|
||||
|
||||
|
||||
learn = model_to_learner(models.resnet18(pretrained=True), IMAGENET_IM_SIZE)
|
||||
|
@ -181,23 +85,42 @@ learn = model_to_learner(models.resnet18(pretrained=True), IMAGENET_IM_SIZE)
|
|||
|
||||
# To be able to use this model, we need to export it to our local machine. We store it in an `outputs/` subfolder.
|
||||
|
||||
# In[6]:
|
||||
# In[3]:
|
||||
|
||||
|
||||
current_directory = os.getcwd()
|
||||
output_folder = os.path.join(current_directory, "outputs")
|
||||
MODEL_NAME = (
|
||||
output_folder = os.path.join(os.getcwd(), "outputs")
|
||||
model_name = (
|
||||
"im_classif_resnet18"
|
||||
) # Name we will give our model both locally and on Azure
|
||||
PICKLED_MODEL_NAME = MODEL_NAME + ".pkl"
|
||||
pickled_model_name = f"{model_name}.pkl"
|
||||
os.makedirs(output_folder, exist_ok=True)
|
||||
|
||||
learn.export(os.path.join(output_folder, PICKLED_MODEL_NAME))
|
||||
learn.export(os.path.join(output_folder, pickled_model_name))
|
||||
|
||||
|
||||
# ## 6. Model deployment on Azure <a id="deploy"></a>
|
||||
# ## 3. Model deployment on Azure <a id="deploy"></a>
|
||||
#
|
||||
# ### 3.A Workspace retrieval <a id="workspace"></a>
|
||||
#
|
||||
# In [prior notebook](20_azure_workspace_setup.ipynb) notebook, we created a workspace. This is a critical object from which we will build all the pieces we need to deploy our model as a web service. Let's start by retrieving it.
|
||||
|
||||
# ### 6.A Model registration <a id="register"></a>
|
||||
# In[4]:
|
||||
|
||||
|
||||
ws = Workspace.setup()
|
||||
# setup() refers to our config.json file by default
|
||||
|
||||
# Print the workspace attributes
|
||||
print(
|
||||
"Workspace name: " + ws.name,
|
||||
"Workspace region: " + ws.location,
|
||||
"Subscription id: " + ws.subscription_id,
|
||||
"Resource group: " + ws.resource_group,
|
||||
sep="\n",
|
||||
)
|
||||
|
||||
|
||||
# ### 3.B Model registration <a id="register"></a>
|
||||
#
|
||||
# Our final goal is to deploy our model as a web service. To do so, we need to first register it in our workspace, i.e. place it in our workspace's model registry. We can do this in 2 ways:
|
||||
# 1. register the model directly
|
||||
|
@ -207,23 +130,23 @@ learn.export(os.path.join(output_folder, PICKLED_MODEL_NAME))
|
|||
#
|
||||
# The cells below show each of the methods.
|
||||
|
||||
# #### 6.A.a Without experiment <a id="noexp"></a>
|
||||
# #### Without experiment <a id="noexp"></a>
|
||||
#
|
||||
# We leverage the `register` method from the Azure ML `Model` object. For that, we just need the location of the model we saved on our local machine, its name and our workspace object.
|
||||
|
||||
# In[7]:
|
||||
# In[5]:
|
||||
|
||||
|
||||
model = Model.register(
|
||||
model_path=os.path.join("outputs", PICKLED_MODEL_NAME),
|
||||
model_name=MODEL_NAME,
|
||||
model_path=os.path.join("outputs", pickled_model_name),
|
||||
model_name=model_name,
|
||||
tags={"Model": "Pretrained ResNet18"},
|
||||
description="Image classifier",
|
||||
workspace=ws,
|
||||
)
|
||||
|
||||
|
||||
# #### 6.A.b With an experiment <a id="exp"></a>
|
||||
# #### With an experiment <a id="exp"></a>
|
||||
#
|
||||
# An experiment contains a series of trials called `Runs`. A run typically contains some tasks, such as training a model, etc. Through a run's methods, we can log several metrics such as training and test loss and accuracy, and even tag our run. The full description of the run class is available [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py). In our case, however, we just need the run to attach our model file to our workspace and experiment.
|
||||
#
|
||||
|
@ -235,7 +158,7 @@ model = Model.register(
|
|||
#
|
||||
# Let's first create a new experiment. If an experiment with the same name already exists in our workspace, the run we will generate will be recorded under that already existing experiment.
|
||||
|
||||
# In[8]:
|
||||
# In[6]:
|
||||
|
||||
|
||||
# Create a new/Retrieve an existing experiment
|
||||
|
@ -246,7 +169,7 @@ print(
|
|||
)
|
||||
|
||||
|
||||
# In[9]:
|
||||
# In[7]:
|
||||
|
||||
|
||||
# Initialize the run
|
||||
|
@ -260,25 +183,23 @@ run = experiment.start_logging(snapshot_directory=None)
|
|||
|
||||
# We can now attach our local model to our workspace and experiment.
|
||||
|
||||
# In[10]:
|
||||
# In[8]:
|
||||
|
||||
|
||||
# Upload the model (.pkl) file to Azure
|
||||
run.upload_file(
|
||||
name=os.path.join("outputs", PICKLED_MODEL_NAME),
|
||||
path_or_stream=os.path.join(
|
||||
current_directory, "outputs", PICKLED_MODEL_NAME
|
||||
),
|
||||
name=os.path.join("outputs", pickled_model_name),
|
||||
path_or_stream=os.path.join(os.getcwd(), "outputs", pickled_model_name),
|
||||
)
|
||||
|
||||
|
||||
# In[11]:
|
||||
# In[9]:
|
||||
|
||||
|
||||
# Register the model with the workspace
|
||||
model = run.register_model(
|
||||
model_name=MODEL_NAME,
|
||||
model_path=os.path.join("outputs", PICKLED_MODEL_NAME),
|
||||
model_name=model_name,
|
||||
model_path=os.path.join("outputs", pickled_model_name),
|
||||
tags={"Model": "Pretrained ResNet18"},
|
||||
)
|
||||
# !!! We need to make sure that the model name we use here is the same as in the scoring script below !!!
|
||||
|
@ -296,7 +217,7 @@ model = run.register_model(
|
|||
|
||||
# We can also check that it is programatically accessible
|
||||
|
||||
# In[12]:
|
||||
# In[10]:
|
||||
|
||||
|
||||
print(
|
||||
|
@ -304,7 +225,7 @@ print(
|
|||
)
|
||||
|
||||
|
||||
# In[13]:
|
||||
# In[11]:
|
||||
|
||||
|
||||
run.get_file_names()
|
||||
|
@ -312,7 +233,7 @@ run.get_file_names()
|
|||
|
||||
# If we are also interested in verifying which model we uploaded, we can download it to our local machine
|
||||
|
||||
# In[14]:
|
||||
# In[12]:
|
||||
|
||||
|
||||
model.download()
|
||||
|
@ -322,21 +243,21 @@ model.download()
|
|||
|
||||
# We are all done with our model registration, so we can close our run.
|
||||
|
||||
# In[15]:
|
||||
# In[13]:
|
||||
|
||||
|
||||
# Close the run
|
||||
run.complete()
|
||||
|
||||
|
||||
# In[16]:
|
||||
# In[14]:
|
||||
|
||||
|
||||
# Access the portal
|
||||
run
|
||||
|
||||
|
||||
# ### 6.B Scoring script <a id="scoring"></a>
|
||||
# ### 3.C Scoring script <a id="scoring"></a>
|
||||
# For the web service to return predictions on a given input image, we need to provide it with instructions on how to use the model we just registered. These instructions are stored in the scoring script.
|
||||
#
|
||||
# This script must contain two required functions, `init()` and `run(input_data)`:
|
||||
|
@ -347,29 +268,27 @@ run
|
|||
#
|
||||
# This file must also be stored in the current directory.
|
||||
|
||||
# In[17]:
|
||||
# In[15]:
|
||||
|
||||
|
||||
scoring_script = "score.py"
|
||||
|
||||
|
||||
# In[18]:
|
||||
# In[16]:
|
||||
|
||||
|
||||
get_ipython().run_cell_magic(
|
||||
"writefile",
|
||||
"$scoring_script",
|
||||
'# Copyright (c) Microsoft. All rights reserved.\n# Licensed under the MIT license.\n\nimport json\n\nfrom base64 import b64decode\nfrom io import BytesIO\n\nfrom azureml.core.model import Model\nfrom fastai.vision import *\n\ndef init():\n global model\n model_path = Model.get_model_path(model_name=\'im_classif_resnet18\')\n # ! We cannot use MODEL_NAME here otherwise the execution on Azure will fail !\n \n model_dir_path, model_filename = os.path.split(model_path)\n model = load_learner(path=model_dir_path, fname=model_filename)\n\n\ndef run(raw_data):\n\n # Expects raw_data to be a list within a json file\n result = [] \n \n for im_string in json.loads(raw_data)[\'data\']:\n im_bytes = b64decode(im_string)\n try:\n im = open_image(BytesIO(im_bytes))\n pred_class, pred_idx, outputs = model.predict(im)\n result.append({"label": str(pred_class), "probability": str(outputs[pred_idx].item())})\n except Exception as e:\n result.append({"label": str(e), "probability": \'\'})\n return result',
|
||||
'# Copyright (c) Microsoft. All rights reserved.\n# Licensed under the MIT license.\n\nimport json\n\nfrom base64 import b64decode\nfrom io import BytesIO\n\nfrom azureml.core.model import Model\nfrom fastai.vision import *\n\ndef init():\n global model\n model_path = Model.get_model_path(model_name=\'im_classif_resnet18\')\n # ! We cannot use the *model_name* variable here otherwise the execution on Azure will fail !\n \n model_dir_path, model_filename = os.path.split(model_path)\n model = load_learner(path=model_dir_path, fname=model_filename)\n\n\ndef run(raw_data):\n\n # Expects raw_data to be a list within a json file\n result = [] \n \n for im_string in json.loads(raw_data)[\'data\']:\n im_bytes = b64decode(im_string)\n try:\n im = open_image(BytesIO(im_bytes))\n pred_class, pred_idx, outputs = model.predict(im)\n result.append({"label": str(pred_class), "probability": str(outputs[pred_idx].item())})\n except Exception as e:\n result.append({"label": str(e), "probability": \'\'})\n return result',
|
||||
)
|
||||
|
||||
|
||||
# ### 6.C Environment setup <a id="env"></a>
|
||||
# ### 3.D Environment setup <a id="env"></a>
|
||||
#
|
||||
# In order to make predictions on the Azure platform, it is important to create an environment as similar as possible to the one in which the model was trained. Here, we use a fast.ai pretrained model that also requires pytorch and a few other libraries. To re-create this environment, we use a [Docker container](https://www.docker.com/resources/what-container). We configure it via a yaml file that will contain all the conda dependencies needed by the model. This yaml file is a subset of `image_classification/environment.yml`.
|
||||
#
|
||||
# <i><b>Note:</b> If we had trained our model locally, we would have created a yaml file that contains the same libraries as what is installed on our local machine.</i>
|
||||
# In order to make predictions on the Azure platform, it is important to create an environment as similar as possible to the one in which the model was trained. Here, we use a fast.ai pretrained model that also requires pytorch and a few other libraries. To re-create this environment, we use a [Docker container](https://www.docker.com/resources/what-container). We configure it via a yaml file that will contain all the conda dependencies needed by the model. This yaml file is a subset of `<repo_root>/classification/environment.yml`.
|
||||
|
||||
# In[19]:
|
||||
# In[17]:
|
||||
|
||||
|
||||
# Create a deployment-specific yaml file from image_classification/environment.yml
|
||||
|
@ -385,7 +304,7 @@ generate_yaml(
|
|||
|
||||
# There are different ways of creating a Docker image on Azure. Here, we create it separately from the service it will be used by. This way of proceeding gives us direct access to the Docker image object. Thus, if the service deployment fails, but the Docker image gets deployed successfully, we can try deploying the service again, without having to create a new image all over again.
|
||||
|
||||
# In[20]:
|
||||
# In[18]:
|
||||
|
||||
|
||||
# Configure the Docker image
|
||||
|
@ -402,21 +321,20 @@ image_config = ContainerImage.image_configuration(
|
|||
)
|
||||
|
||||
|
||||
# In[21]:
|
||||
# In[19]:
|
||||
|
||||
|
||||
# Create the Docker image
|
||||
docker_image = ContainerImage.create(
|
||||
name="image-classif-resnet18-f48",
|
||||
models=[model], # the model is passed as part of a list
|
||||
models=[model],
|
||||
image_config=image_config,
|
||||
workspace=ws,
|
||||
)
|
||||
# The image name should not contain more than 32 characters, and should not contain any spaces, dots or underscores
|
||||
# A Docker image can contain several model objects. Here, we just have one.
|
||||
|
||||
|
||||
# In[22]:
|
||||
# In[20]:
|
||||
|
||||
|
||||
get_ipython().run_cell_magic(
|
||||
|
@ -435,21 +353,21 @@ get_ipython().run_cell_magic(
|
|||
#
|
||||
# It happens, sometimes, that the deployment of the Docker image fails. Re-running the previous command typically solves the problem. If it doesn't, however, we can run the following one and inspect the deployment logs.
|
||||
|
||||
# In[23]:
|
||||
# In[21]:
|
||||
|
||||
|
||||
print(ws.images["image-classif-resnet18-f48"].image_build_log_uri)
|
||||
|
||||
|
||||
# ### 6.D Computational resources <a id="compute"></a>
|
||||
|
||||
# ### 3.E Computational resources <a id="compute"></a>
|
||||
#
|
||||
# In this notebook, we use [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview) (ACI) which are good for quick and [cost-effective](https://azure.microsoft.com/en-us/pricing/details/container-instances/) development/test deployment scenarios.
|
||||
#
|
||||
# To set them up properly, we need to indicate the number of CPU cores and the amount of memory we want to allocate to our web service. Optional tags and descriptions are also available for us to identify the instances in AzureML when looking at the `Compute` tab in the Azure Portal.
|
||||
#
|
||||
# <i><b>Note:</b> For production workloads, it is better to use [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/) (AKS) instead. We will demonstrate how to do this in the [next notebook](22_deployment_on_azure_kubernetes_service.ipynb).<i>
|
||||
|
||||
# In[24]:
|
||||
# In[22]:
|
||||
|
||||
|
||||
# Create a deployment configuration with 1 CPU and 5 gigabytes of RAM
|
||||
|
@ -461,7 +379,7 @@ aci_config = AciWebservice.deploy_configuration(
|
|||
)
|
||||
|
||||
|
||||
# ### 6.E Web service deployment <a id="websvc"></a>
|
||||
# ### 3.F Web service deployment <a id="websvc"></a>
|
||||
|
||||
# The final step to deploying our web service is to call `WebService.deploy_from_image()`. This function uses the Docker image and the deployment configuration we created above to perform the following:
|
||||
#
|
||||
|
@ -480,7 +398,7 @@ aci_config = AciWebservice.deploy_configuration(
|
|||
#
|
||||
# <i><b>Note:</b> The web service creation can take a few minutes</i>
|
||||
|
||||
# In[25]:
|
||||
# In[23]:
|
||||
|
||||
|
||||
# Define how to deploy the web service
|
||||
|
@ -498,7 +416,7 @@ service = Webservice.deploy_from_image(
|
|||
# to re-use the same Docker image in case the deployment of this service fails, or even for other
|
||||
# types of deployments, as we will see in the next notebook.
|
||||
|
||||
# In[26]:
|
||||
# In[24]:
|
||||
|
||||
|
||||
# Deploy the web service
|
||||
|
@ -521,7 +439,7 @@ service.wait_for_deployment(show_output=True)
|
|||
# print(service.get_logs())
|
||||
|
||||
|
||||
# In[27]:
|
||||
# In[25]:
|
||||
|
||||
|
||||
# Retrieve the service status
|
||||
|
@ -536,114 +454,29 @@ print(
|
|||
# <img src="media/docker_images.jpg" width="800" alt="Azure portal view of the Images section">
|
||||
# <img src="media/deployments.jpg" width="800" alt="Azure portal view of the Deployments section">
|
||||
|
||||
# ## 7. Testing of the web service <a id="test"></a>
|
||||
|
||||
# Our web service is now up and running. To make sure that it is working as expected, let's test it.
|
||||
# ## 4. Notes on web service deployment <a id="notes"></a>
|
||||
#
|
||||
# We first need to retrieve test images and to pre-process them into the format expected by our model. A service typically expects input data to be in a JSON serializable format. Here, we use our own `ims2strlist()` function to transform our .jpg images into strings of bytes.
|
||||
|
||||
# In[28]:
|
||||
|
||||
|
||||
im_url_root = "https://cvbp.blob.core.windows.net/public/images/"
|
||||
im_filenames = ["cvbp_milk_bottle.jpg", "cvbp_water_bottle.jpg"]
|
||||
|
||||
local_im_paths = []
|
||||
for im_filename in im_filenames:
|
||||
# Retrieve test images from our storage blob
|
||||
r = requests.get(os.path.join(im_url_root, im_filename))
|
||||
|
||||
# Copy test images to local data/ folder
|
||||
with open(os.path.join(data_path(), im_filename), "wb") as f:
|
||||
f.write(r.content)
|
||||
|
||||
# Extract local path to test images
|
||||
local_im_paths.append(os.path.join(data_path(), im_filename))
|
||||
|
||||
# Convert images to json object
|
||||
im_string_list = ims2strlist(local_im_paths)
|
||||
test_samples = json.dumps({"data": im_string_list})
|
||||
|
||||
|
||||
# ### 7.A Using the `run` API <a id="api"></a>
|
||||
#
|
||||
# Our data are now properly formatted. We can send them to our web service.
|
||||
|
||||
# In[29]:
|
||||
|
||||
|
||||
# Predict using the deployed model
|
||||
result = service.run(test_samples)
|
||||
|
||||
|
||||
# In[30]:
|
||||
|
||||
|
||||
# Plot the results
|
||||
actual_labels = ["milk_bottle", "water_bottle"]
|
||||
for k in range(len(result)):
|
||||
title = "{}/{} - {}%".format(
|
||||
actual_labels[k],
|
||||
result[k]["label"],
|
||||
round(100.0 * float(result[k]["probability"]), 2),
|
||||
)
|
||||
open_image(local_im_paths[k]).show(title=title)
|
||||
|
||||
|
||||
# ### 7.B Via a raw HTTP request <a id="http"></a>
|
||||
|
||||
# In[31]:
|
||||
|
||||
|
||||
# Send the same test data
|
||||
payload = {"data": im_string_list}
|
||||
resp = requests.post(service.scoring_uri, json=payload)
|
||||
|
||||
# Alternative way of sending the test data
|
||||
# headers = {'Content-Type':'application/json'}
|
||||
# resp = requests.post(service.scoring_uri, test_samples, headers=headers)
|
||||
|
||||
print(f"POST to url: {service.scoring_uri}")
|
||||
print(f"Prediction: {resp.text}")
|
||||
|
||||
|
||||
# ### 7.C Notes on web service deployment <a id="notes"></a>
|
||||
|
||||
# As we discussed above, Azure Container Instances tend to be used to develop and test deployments. They are typically configured with CPUs, which usually suffice when the number of requests per second is not too high. When working with several instances, we can configure them further by specifically [allocating CPU resources](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-container-groups#deployment) to each of them.
|
||||
#
|
||||
# For production requirements, i.e. when > 100 requests per second are expected, we recommend deploying models to Azure Kubernetes Service (AKS). It is a convenient infrastructure as it manages hosted Kubernetes environments, and makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters, the latter of which are [more economical and efficient](https://azure.microsoft.com/en-us/blog/gpus-vs-cpus-for-deployment-of-deep-learning-models/) when serving complex models such as deep neural networks, and/or when traffic to the endpoint is high.
|
||||
# For production requirements, i.e. when > 100 requests per second are expected, we recommend deploying models to Azure Kubernetes Service (AKS). It is a convenient infrastructure as it manages hosted Kubernetes environments, and makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters.
|
||||
#
|
||||
# We will see an example of this in the [next notebook](22_deployment_on_azure_kubernetes_service.ipynb).
|
||||
|
||||
# ## 8. Clean up <a id="clean"></a>
|
||||
# ## 5. Clean up <a id="clean"></a>
|
||||
#
|
||||
# Throughout the notebook, we used a workspace and Azure container instances.
|
||||
# Throughout the notebook, we used a workspace and Azure container instances. To get a sense of the cost we incurred, we can refer to this [calculator](https://azure.microsoft.com/en-us/pricing/calculator/). We can also navigate to the [Cost Management + Billing](https://ms.portal.azure.com/#blade/Microsoft_Azure_Billing/ModernBillingMenuBlade/Overview) pane on the portal, click on our subscription ID, and click on the Cost Analysis tab to check our credit usage.
|
||||
#
|
||||
# When we first created our workspace, 4 extra resources were automatically added to it:
|
||||
# - A container registry, which hosts our Docker images
|
||||
# - A storage account, in which our output files get stored
|
||||
# - Application Insights, which allows us to monitor the health of and traffic to our web service, as we will see in the next notebook
|
||||
# - A key vault, which stores our credentials.
|
||||
# In order not to incur extra costs, let's delete the resources we no longer need.
|
||||
#
|
||||
# In this notebook, we also hosted our web service on container instances. Overall, during the time it took us to run this notebook (assuming ~ 1h), the cost we incurred was of less than $3.
|
||||
#
|
||||
# To get a better sense of pricing, we can refer to this [calculator](https://azure.microsoft.com/en-us/pricing/calculator/). We can also navigate to the [Cost Management + Billing](https://ms.portal.azure.com/#blade/Microsoft_Azure_Billing/ModernBillingMenuBlade/Overview) pane on the portal, click on our subscription ID, and click on the Cost Analysis tab to check our credit usage.
|
||||
#
|
||||
# In order not to incur extra costs, let's now delete the resources we no longer need.
|
||||
|
||||
# ### 8.A Service termination <a id="svcterm"></a>
|
||||
#
|
||||
# Now that we have verified that our web service works well on ACI, we can delete it. This helps reduce [costs](https://azure.microsoft.com/en-us/pricing/details/container-instances/), since the container group we were paying for no longer exists, and allows us to keep our workspace clean.
|
||||
# Once we have verified that our web service works well on ACI (cf. "Next steps" section below), we can delete it. This helps reduce [costs](https://azure.microsoft.com/en-us/pricing/details/container-instances/), since the container group we were paying for no longer exists, and allows us to keep our workspace clean.
|
||||
|
||||
# In[ ]:
|
||||
|
||||
|
||||
service.delete()
|
||||
# service.delete()
|
||||
|
||||
|
||||
# At this point, the main resource we are paying for is the <b>Standard</b> Azure Container Registry (ACR), which contains our Docker image, and came as a default when we created our workspace. Details on pricing are available [here](https://azure.microsoft.com/en-us/pricing/details/container-registry/).
|
||||
|
||||
# ### 8.B Image deletion <a id="imdel"></a>
|
||||
# At this point, the main resource we are paying for is the <b>Standard</b> Azure Container Registry (ACR), which contains our Docker image. Details on pricing are available [here](https://azure.microsoft.com/en-us/pricing/details/container-registry/).
|
||||
#
|
||||
# We may decide to use our Docker image in a separate ACI or even in an AKS deployment. In that case, we should keep it available in our workspace. However, if we no longer have a use for it, we can delete it.
|
||||
|
||||
|
@ -653,8 +486,6 @@ service.delete()
|
|||
# docker_image.delete()
|
||||
|
||||
|
||||
# ### 8.C Workspace deletion <a id="wsdel"></a>
|
||||
#
|
||||
# If our goal is to continue using our workspace, we should keep it available. On the contrary, if we plan on no longer using it and its associated resources, we can delete it.
|
||||
#
|
||||
# <i><b>Note:</b> Deleting the workspace will delete all the experiments, outputs, models, Docker images, deployments, etc. that we created in that workspace</i>
|
||||
|
@ -666,8 +497,6 @@ service.delete()
|
|||
# This deletes our workspace, the container registry, the account storage, Application Insights and the key vault
|
||||
|
||||
|
||||
# ## 9. Next steps <a id="next-steps"></a>
|
||||
# ## 6. Next steps <a id="next-steps"></a>
|
||||
#
|
||||
# In the [next notebook](22_deployment_on_azure_kubernetes_service.ipynb), we will leverage the same Docker image, and deploy our model on AKS. In our [third tutorial](23_web_service_testing.ipynb), we will then learn how a Flask app, with an interactive user interface, can be used to call our web service.
|
||||
|
||||
# In[ ]:
|
||||
# In the [next tutorial](22_deployment_on_azure_kubernetes_service.ipynb), we will leverage the same Docker image, and deploy our model on AKS. We will then test both of our web services in the [23_aci_aks_web_service_testing.ipynb](23_aci_aks_web_service_testing.ipynb) notebook.
|
||||
|
|
|
@ -6,50 +6,44 @@
|
|||
# <i>Licensed under the MIT License.</i>
|
||||
#
|
||||
#
|
||||
# # Deployment of a model as a service with Azure Kubernetes Service
|
||||
# # Deployment of a model to Azure Kubernetes Service (AKS)
|
||||
#
|
||||
# ## Table of contents
|
||||
# 1. [Introduction](#intro)
|
||||
# 1. [Pre-requisites](#pre-reqs)
|
||||
# 1. [Library import](#libraries)
|
||||
# 1. [Azure workspace](#workspace)
|
||||
# 1. [Model deployment on AKS](#deploy)
|
||||
# 1. [Workspace retrieval](#workspace)
|
||||
# 1. [Docker image retrieval](#docker_image)
|
||||
# 1. [AKS compute target creation](#compute)
|
||||
# 1. [Monitoring activation](#monitor)
|
||||
# 1. [Service deployment](#svc_deploy)
|
||||
# 1. [Testing of the web service](#testing)
|
||||
# 1. [Clean up](#clean)
|
||||
# 1. [Monitoring deactivation and service deletion](#insights)
|
||||
# 1. [Workspace deletion](#del_workspace)
|
||||
# 1. [Next steps](#next)
|
||||
#
|
||||
#
|
||||
# ## 1. Introduction <a id="intro"/>
|
||||
#
|
||||
# In many real life scenarios, trained machine learning models need to be deployed to production. As we saw in the [first](21_deployment_on_azure_container_instances.ipynb) deployment notebook, this can be done by deploying on Azure Container Instances. In this tutorial, we will get familiar with another way of implementing a model into a production environment, this time using [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads) (AKS).
|
||||
#
|
||||
# AKS manages hosted Kubernetes environments. It makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters. The latter have been shown to be [more economical and efficient](https://azure.microsoft.com/en-us/blog/gpus-vs-cpus-for-deployment-of-deep-learning-models/) when serving complex models such as deep neural networks, and/or when traffic to the web service is high (> 100 requests/second).
|
||||
# In many real life scenarios, trained machine learning models need to be deployed to production. As we saw in the [prior](21_deployment_on_azure_container_instances.ipynb) deployment notebook, this can be done by deploying on Azure Container Instances. In this tutorial, we will get familiar with another way of implementing a model into a production environment, this time using [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads) (AKS).
|
||||
#
|
||||
# AKS manages hosted Kubernetes environments. It makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters.
|
||||
#
|
||||
# At the end of this tutorial, we will have learned how to:
|
||||
#
|
||||
# - Deploy a model as a web service using AKS
|
||||
# - Monitor our new service.
|
||||
|
||||
# ## 2. Pre-requisites <a id="pre-reqs"/>
|
||||
# ### Pre-requisites <a id="pre-reqs"/>
|
||||
#
|
||||
# This notebook relies on resources we created in [21_deployment_on_azure_container_instances.ipynb](21_deployment_on_azure_container_instances.ipynb):
|
||||
# - Our local conda environment and Azure Machine Learning workspace
|
||||
# - Our Azure Machine Learning workspace
|
||||
# - The Docker image that contains the model and scoring script needed for the web service to work.
|
||||
#
|
||||
# If we are missing any of these, we should go back and run the steps from the sections "2. Pre-requisites" to "6.C Environment setup" to generate them.
|
||||
|
||||
# ## 3. Library import <a id="libraries"/>
|
||||
# If we are missing any of these, we should go back and run the steps from the sections "Pre-requisites" to "3.D Environment setup" to generate them.
|
||||
#
|
||||
# ### Library import <a id="libraries"/>
|
||||
#
|
||||
# Now that our prior resources are available, let's first import a few libraries we will need for the deployment on AKS.
|
||||
|
||||
# In[4]:
|
||||
# In[1]:
|
||||
|
||||
|
||||
# For automatic reloading of modified libraries
|
||||
|
@ -62,42 +56,35 @@ from azureml.core.compute import AksCompute, ComputeTarget
|
|||
from azureml.core.webservice import AksWebservice, Webservice
|
||||
|
||||
|
||||
# ## 4. Azure workspace <a id="workspace"/>
|
||||
# ## 2. Model deployment on AKS <a id="deploy"/>
|
||||
#
|
||||
# In the prior notebook, we retrieved an existing or created a new workspace, and generated an `./aml_config/config.json` file.
|
||||
# Let's use it to load this workspace.
|
||||
# ### 2.A Workspace retrieval <a id="workspace">
|
||||
#
|
||||
# <i><b>Note:</b> The Docker image we will use below is attached to the workspace we used in the prior notebook. It is then important to use the same workspace here. If, for any reason, we need to use a separate workspace here, then the steps followed to create a Docker image containing our image classifier model in the prior notebook, should be reproduced here.</i>
|
||||
# Let's now load the workspace we used in the [prior notebook](21_deployment_on_azure_container_instances.ipynb).
|
||||
#
|
||||
# <i><b>Note:</b> The Docker image we will use below is attached to that workspace. It is then important to use the same workspace here. If, for any reason, we needed to use another workspace instead, we would need to reproduce, here, the steps followed to create a Docker image containing our image classifier model in the prior notebook.</i>
|
||||
|
||||
# In[5]:
|
||||
# In[2]:
|
||||
|
||||
|
||||
ws = Workspace.from_config()
|
||||
# from_config() refers to this config.json file by default
|
||||
|
||||
|
||||
# Let's check that the workspace is properly loaded
|
||||
|
||||
# In[6]:
|
||||
|
||||
ws = Workspace.setup()
|
||||
# setup() refers to our config.json file by default
|
||||
|
||||
# Print the workspace attributes
|
||||
print(
|
||||
"Workspace name: " + ws.name,
|
||||
"Azure region: " + ws.location,
|
||||
"Workspace region: " + ws.location,
|
||||
"Subscription id: " + ws.subscription_id,
|
||||
"Resource group: " + ws.resource_group,
|
||||
sep="\n",
|
||||
)
|
||||
|
||||
|
||||
# ## 5. Model deployment on AKS <a id="deploy">
|
||||
# ### 2.B Docker image retrieval <a id="docker_image">
|
||||
#
|
||||
# ### 5.A Docker image retrieval <a id="docker_image">
|
||||
#
|
||||
# As for the deployment on Azure Container Instances, we will use Docker containers. The Docker image we created in the prior notebook is very much suitable for our deployment on Azure Kubernetes Service, as it contains the libraries we need and the model we registered. Let's make sure this Docker image is still available (if not, we can just run the cells of section "6. Model deployment on Azure" of the [prior notebook](https://github.com/Microsoft/ComputerVision/blob/staging/image_classification/notebooks/21_deployment_on_azure_container_instances.ipynb)).
|
||||
# We can reuse the Docker image we created in section 3. of the [previous tutorial](21_deployment_on_azure_container_instances.ipynb). Let's make sure that it is still available.
|
||||
|
||||
# In[7]:
|
||||
# In[3]:
|
||||
|
||||
|
||||
print("Docker images:")
|
||||
|
@ -109,7 +96,7 @@ for docker_im in ws.images:
|
|||
|
||||
# As we did not delete it in the prior notebook, our Docker image is still present in our workspace. Let's retrieve it.
|
||||
|
||||
# In[8]:
|
||||
# In[4]:
|
||||
|
||||
|
||||
docker_image = ws.images["image-classif-resnet18-f48"]
|
||||
|
@ -119,21 +106,17 @@ docker_image = ws.images["image-classif-resnet18-f48"]
|
|||
#
|
||||
# <i><b>Note:</b> We will not use the `registered_model` object anywhere here. We are running the next 2 cells just for verification purposes.</i>
|
||||
|
||||
# In[9]:
|
||||
# In[6]:
|
||||
|
||||
|
||||
registered_model = docker_image.models[0]
|
||||
|
||||
|
||||
# In[10]:
|
||||
|
||||
|
||||
print(
|
||||
f"Existing model:\n --> Name: {registered_model.name}\n --> Version: {registered_model.version}\n --> ID: {registered_model.id} \n --> Creation time: {registered_model.created_time}\n --> URL: {registered_model.url}"
|
||||
)
|
||||
|
||||
|
||||
# ### 5.B AKS compute target creation<a id="compute"/>
|
||||
# ### 2.C AKS compute target creation<a id="compute"/>
|
||||
#
|
||||
# In the case of deployment on AKS, in addition to the Docker image, we need to define computational resources. This is typically a cluster of CPUs or a cluster of GPUs. If we already have a Kubernetes-managed cluster in our workspace, we can use it, otherwise, we can create a new one.
|
||||
#
|
||||
|
@ -141,7 +124,7 @@ print(
|
|||
#
|
||||
# Let's first check what types of compute resources we have, if any
|
||||
|
||||
# In[11]:
|
||||
# In[7]:
|
||||
|
||||
|
||||
print("List of compute resources associated with our workspace:")
|
||||
|
@ -149,9 +132,7 @@ for cp in ws.compute_targets:
|
|||
print(f" --> {cp}: {ws.compute_targets[cp]}")
|
||||
|
||||
|
||||
# #### 5.B.a Creation of a new AKS cluster
|
||||
#
|
||||
# In the case where we have no compute resource available, we can create a new one. For this, we can choose between a CPU-based or a GPU-based cluster of virtual machines. There is a [wide variety](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general) of machine types that can be used. In the present example, however, we will not need the fastest machines that exist nor the most memory optimized ones. We will use typical default machines:
|
||||
# In the case where we have no compute resource available, we can create a new one. For this, we can choose between a CPU-based or a GPU-based cluster of virtual machines. The latter is typically better suited for web services with high traffic (i.e. > 100 requests per second) and high GPU utilization. There is a [wide variety](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general) of machine types that can be used. In the present example, however, we will not need the fastest machines that exist nor the most memory optimized ones. We will use typical default machines:
|
||||
# - [Standard D3 V2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general#dv2-series):
|
||||
# - 4 vCPUs
|
||||
# - 14 GB of memory
|
||||
|
@ -163,13 +144,13 @@ for cp in ws.compute_targets:
|
|||
# <i><b>Notes:</b></i>
|
||||
# - These are Azure-specific denominations
|
||||
# - Information on optimized machines can be found [here](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general#other-sizes)
|
||||
# - When configuring the provisioning of an AKS cluster, we need to choose a type of machine, as examplified above. This choice must be such that the number of virtual machines (also called `agent nodes`), we require, multiplied by the number of vCPUs on each machine must be greater than or equal to 12 vCPUs. This is indeed the [minimum needed](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#create-a-new-cluster) for such cluster. By default, a pool of 3 virtual machines gets provisioned on a new AKS cluster to allow for redundancy. So, if the type of virtual machine we choose has a number of vCPUs (`vm_size`) smaller than 4, we need to increase the number of machines (`agent_count`) such that `agent_count x vm_size` ≥ `12` virtual CPUs. `agent_count` and `vm_size` are both parameters we can pass to the `provisioning_configuration()` method below.
|
||||
# - When configuring the provisioning of an AKS cluster, we need to choose a type of machine, as examplified above. This choice must be such that the number of virtual machines (also called `agent nodes`), we require, multiplied by the number of vCPUs on each machine must be greater than or equal to 12 vCPUs. This is indeed the [minimum needed](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#create-a-new-aks-cluster) for such cluster. By default, a pool of 3 virtual machines gets provisioned on a new AKS cluster to allow for redundancy. So, if the type of virtual machine we choose has a number of vCPUs (`vm_size`) smaller than 4, we need to increase the number of machines (`agent_count`) such that `agent_count x vm_size` ≥ `12` virtual CPUs. `agent_count` and `vm_size` are both parameters we can pass to the `provisioning_configuration()` method below.
|
||||
# - [This document](https://docs.microsoft.com/en-us/azure/templates/Microsoft.ContainerService/2019-02-01/managedClusters?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#managedclusteragentpoolprofile-object) provides the full list of virtual machine types that can be deployed in an AKS cluster
|
||||
# - Additional considerations on deployments using GPUs are available [here](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#deployment-considerations)
|
||||
#
|
||||
# Here, we will use a cluster of CPUs. The creation of such resource typically takes several minutes to complete.
|
||||
|
||||
# In[12]:
|
||||
# In[8]:
|
||||
|
||||
|
||||
# Declare the name of the cluster
|
||||
|
@ -215,27 +196,11 @@ else:
|
|||
# We retrieved the <aks_cluster_name> AKS compute target
|
||||
# ```
|
||||
|
||||
# #### 5.B.b Alternative: Attachment of an existing AKS cluster
|
||||
#
|
||||
# Within our overall subscription, we may already have created an AKS cluster. This cluster may not be visible when we run the `ws.compute_targets` command, though. This is because it is not attached to our present workspace. If we want to use that cluster instead, we need to attach it to our workspace, first. We can do this as follows:
|
||||
#
|
||||
#
|
||||
# ```python
|
||||
# existing_aks_name = '<name_of_the_existing_detached_aks_cluster>'
|
||||
# resource_id = '/subscriptions/<subscription_id/resourcegroups/<resource_group>/providers/Microsoft.ContainerService/managedClusters/<aks_cluster_full_name>'
|
||||
# # <aks_cluster_full_name> can be found by clicking on the aks cluster, in the Azure portal, as the "Resource ID" string
|
||||
# # <subscription_id> can be obtained through ws.subscription_id, and <resource_group> through ws.resource_group
|
||||
#
|
||||
# attach_config = AksCompute.attach_configuration(resource_id=resource_id)
|
||||
# aks_target = ComputeTarget.attach(workspace=ws, name=existing_aks_name, attach_configuration=attach_config)
|
||||
# aks_target.wait_for_completion(show_output = True)
|
||||
# ```
|
||||
|
||||
# This compute target can be seen on the Azure portal, under the `Compute` tab.
|
||||
#
|
||||
# <img src="media/aks_compute_target_cpu.jpg" width="900">
|
||||
|
||||
# In[13]:
|
||||
# In[9]:
|
||||
|
||||
|
||||
# Check provisioning status
|
||||
|
@ -246,24 +211,24 @@ print(
|
|||
|
||||
# The set of resources we will use to deploy our web service on AKS is now provisioned and available.
|
||||
#
|
||||
# ### 5.C Monitoring activation <a id="monitor"/>
|
||||
# ### 2.D Monitoring activation <a id="monitor"/>
|
||||
#
|
||||
# Once our web app is up and running, it is very important to monitor it, and measure the amount of traffic it gets, how long it takes to respond, the type of exceptions that get raised, etc. We will do so through [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview), which is an application performance management service. To enable it on our soon-to-be-deployed web service, we first need to update our AKS configuration file:
|
||||
|
||||
# In[14]:
|
||||
# In[10]:
|
||||
|
||||
|
||||
# Set the AKS web service configuration and add monitoring to it
|
||||
aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)
|
||||
|
||||
|
||||
# ### 5.D Service deployment <a id="svc_deploy"/>
|
||||
# ### 2.E Service deployment <a id="svc_deploy"/>
|
||||
#
|
||||
# We are now ready to deploy our web service. As in the [first](https://github.com/Microsoft/ComputerVision/blob/staging/image_classification/notebooks/21_deployment_on_azure_container_instances.ipynb) notebook, we will deploy from the Docker image. It indeed contains our image classifier model and the conda environment needed for the scoring script to work properly. The parameters to pass to the `Webservice.deploy_from_image()` command are similar to those used for the deployment on ACI. The only major difference is the compute target (`aks_target`), i.e. the CPU cluster we just spun up.
|
||||
# We are now ready to deploy our web service. As in the [first](21_deployment_on_azure_container_instances.ipynb) notebook, we will deploy from the Docker image. It indeed contains our image classifier model and the conda environment needed for the scoring script to work properly. The parameters to pass to the `Webservice.deploy_from_image()` command are similar to those used for the deployment on ACI. The only major difference is the compute target (`aks_target`), i.e. the CPU cluster we just spun up.
|
||||
#
|
||||
# <i><b>Note:</b> This deployment takes a few minutes to complete.</i>
|
||||
|
||||
# In[15]:
|
||||
# In[11]:
|
||||
|
||||
|
||||
if aks_target.provisioning_state == "Succeeded":
|
||||
|
@ -303,29 +268,20 @@ else:
|
|||
#
|
||||
# <img src="media/aks_webservice_cpu.jpg" width="900">
|
||||
|
||||
# Our web service is up, and is running on AKS. We can now proceed to testing it.
|
||||
# Our web service is up, and is running on AKS.
|
||||
|
||||
# ## 6. Testing of the web service <a id="testing"/>
|
||||
# ## 3. Clean up <a id="clean">
|
||||
#
|
||||
# Such testing is a whole task of its own, so we separated it from this notebook. We provide all the needed steps in [23_web_service_testing.ipynb](https://github.com/Microsoft/ComputerVision/blob/service_deploy/image_classification/notebooks/deployment/23_web_service_testing.ipynb). There, we test our service:
|
||||
# - From within our workspace (using `aks_service.run()`)
|
||||
# - From outside our workspace (using `requests.post()`)
|
||||
# - From a Flask app running on our local machine
|
||||
# - From a Flask app deployed on the same AKS cluster as our web service.
|
||||
|
||||
# ## 7. Clean up <a id="clean">
|
||||
#
|
||||
# In a real-life scenario, it is likely that the service we created would need to be up and running at all times. However, in the present demonstrative case, and once we have verified that our service works, we can delete it as well as all the resources we used.
|
||||
# In a real-life scenario, it is likely that the service we created would need to be up and running at all times. However, in the present demonstrative case, and once we have verified that our service works (cf. "Next steps" section below), we can delete it as well as all the resources we used.
|
||||
#
|
||||
# In this notebook, the only resource we added to our subscription, in comparison to what we had at the end of the notebook on ACI deployment, is the AKS cluster. There is no fee for cluster management. The only components we are paying for are:
|
||||
# - the cluster nodes
|
||||
# - the managed OS disks.
|
||||
#
|
||||
# Here, we used Standard D3 V2 machines, which come with a temporary storage of 200 GB. Over the course of this tutorial (assuming ~ 1 hour), this added less than $1 to our bill. Now, it is important to understand that each hour during which the cluster is up gets billed, whether the web service is called or not. The same is true for the ACI and workspace we have been using until now.
|
||||
# Here, we used Standard D3 V2 machines, which come with a temporary storage of 200 GB. Over the course of this tutorial (assuming ~ 1 hour), this changed almost nothing to our bill. Now, it is important to understand that each hour during which the cluster is up gets billed, whether the web service is called or not. The same is true for the ACI and workspace we have been using until now.
|
||||
#
|
||||
# To get a better sense of pricing, we can refer to [this calculator](https://azure.microsoft.com/en-us/pricing/calculator/?service=kubernetes-service#kubernetes-service). We can also navigate to the [Cost Management + Billing pane](https://ms.portal.azure.com/#blade/Microsoft_Azure_Billing/ModernBillingMenuBlade/Overview) on the portal, click on our subscription ID, and click on the Cost Analysis tab to check our credit usage.
|
||||
#
|
||||
# ### 7.A Monitoring deactivation and service deletion <a id="insights"/>
|
||||
# If we plan on no longer using this web service, we can turn monitoring off, and delete the compute target, the service itself as well as the associated Docker image.
|
||||
|
||||
# In[ ]:
|
||||
|
@ -347,7 +303,6 @@ else:
|
|||
|
||||
# At this point, all the service resources we used in this notebook have been deleted. We are only now paying for our workspace.
|
||||
#
|
||||
# ### 7.B Workspace deletion <a id="del_workspace"/>
|
||||
# If our goal is to continue using our workspace, we should keep it available. On the contrary, if we plan on no longer using it and its associated resources, we can delete it.
|
||||
#
|
||||
# <i><b>Note:</b> Deleting the workspace will delete all the experiments, outputs, models, Docker images, deployments, etc. that we created in that workspace.</i>
|
||||
|
@ -359,5 +314,5 @@ else:
|
|||
# This deletes our workspace, the container registry, the account storage, Application Insights and the key vault
|
||||
|
||||
|
||||
# ## 8. Next steps <a id="next"/>
|
||||
# In the [next notebook](https://github.com/Microsoft/ComputerVision/blob/service_deploy/image_classification/notebooks/deployment/23_web_service_testing.ipynb), we will test the web services we deployed on ACI and on AKS. We will also learn how a Flask app, with an interactive user interface, can be used to call our web service.
|
||||
# ## 4. Next steps <a id="next"/>
|
||||
# In the [next notebook](23_aci_aks_web_service_testing.ipynb), we will test the web services we deployed on ACI and on AKS.
|
||||
|
|
|
@ -21,7 +21,6 @@
|
|||
# 1. [Clean up](#clean)
|
||||
# 1. [Application Insights deactivation and web service termination](#del_app_insights)
|
||||
# 1. [Docker image deletion](#del_image)
|
||||
# 1. [Next steps](#next-steps)
|
||||
|
||||
# ## 1. Introduction <a id="intro"/>
|
||||
# In the 2 prior notebooks, we deployed our machine learning model as a web service on [Azure Container Instances](https://github.com/Microsoft/ComputerVisionBestPractices/blob/master/classification/notebooks/21_deployment_on_azure_container_instances.ipynb) (ACI) and on [Azure Kubernetes Service](https://github.com/Microsoft/ComputerVision/blob/master/classification/notebooks/22_deployment_on_azure_kubernetes_service.ipynb) (AKS). In this notebook, we will learn how to test our service:
|
||||
|
@ -336,8 +335,3 @@ for docker_im in ws.images:
|
|||
|
||||
docker_image = ws.images["image-classif-resnet18-f48"]
|
||||
# docker_image.delete()
|
||||
|
||||
|
||||
# ## 6. Next steps <a id="next-steps">
|
||||
#
|
||||
# In the next notebook, we will learn how to create a user interface that will allow our users to interact with our web service through the simple upload of images.
|
||||
|
|
|
@ -9,7 +9,7 @@
|
|||
|
||||
import os
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
import torch
|
||||
from typing import List
|
||||
from tempfile import TemporaryDirectory
|
||||
from utils_cv.common.data import unzip_url
|
||||
|
@ -19,7 +19,12 @@ from utils_cv.classification.data import Urls
|
|||
def path_classification_notebooks():
|
||||
""" Returns the path of the notebooks folder. """
|
||||
return os.path.abspath(
|
||||
os.path.join(os.path.dirname(__file__), os.path.pardir, "classification", "notebooks")
|
||||
os.path.join(
|
||||
os.path.dirname(__file__),
|
||||
os.path.pardir,
|
||||
"classification",
|
||||
"notebooks",
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
|
@ -33,8 +38,11 @@ def classification_notebooks():
|
|||
"01_training_introduction": os.path.join(
|
||||
folder_notebooks, "01_training_introduction.ipynb"
|
||||
),
|
||||
"02_training_accuracy_vs_speed": os.path.join(
|
||||
folder_notebooks, "02_training_accuracy_vs_speed.ipynb"
|
||||
"02_multilabel_classification": os.path.join(
|
||||
folder_notebooks, "02_multilabel_classification.ipynb"
|
||||
),
|
||||
"03_training_accuracy_vs_speed": os.path.join(
|
||||
folder_notebooks, "03_training_accuracy_vs_speed.ipynb"
|
||||
),
|
||||
"10_image_annotation": os.path.join(
|
||||
folder_notebooks, "10_image_annotation.ipynb"
|
||||
|
@ -42,13 +50,15 @@ def classification_notebooks():
|
|||
"11_exploring_hyperparameters": os.path.join(
|
||||
folder_notebooks, "11_exploring_hyperparameters.ipynb"
|
||||
),
|
||||
"12_hard_negative_sampling": os.path.join(
|
||||
folder_notebooks, "12_hard_negative_sampling.ipynb"
|
||||
),
|
||||
"21_deployment_on_azure_container_instances": os.path.join(
|
||||
folder_notebooks,
|
||||
"21_deployment_on_azure_container_instances.ipynb",
|
||||
),
|
||||
"22_deployment_on_azure_kubernetes_service": os.path.join(
|
||||
folder_notebooks,
|
||||
"22_deployment_on_azure_kubernetes_service.ipynb",
|
||||
folder_notebooks, "22_deployment_on_azure_kubernetes_service.ipynb"
|
||||
),
|
||||
}
|
||||
return paths
|
||||
|
@ -91,3 +101,28 @@ def tiny_ic_multidata_path(tmp_session) -> List[str]:
|
|||
def tiny_ic_data_path(tmp_session) -> str:
|
||||
""" Returns the path to the tiny fridge objects dataset. """
|
||||
return unzip_url(Urls.fridge_objects_tiny_path, tmp_session, exist_ok=True)
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def tiny_multilabel_ic_data_path(tmp_session) -> str:
|
||||
""" Returns the path to the tiny fridge objects dataset. """
|
||||
return unzip_url(
|
||||
Urls.multilabel_fridge_objects_tiny_path, tmp_session, exist_ok=True
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def multilabel_result():
|
||||
""" Fake results to test evaluation metrics for multilabel classification. """
|
||||
y_pred = torch.tensor(
|
||||
[
|
||||
[0.9, 0.0, 0.0, 0.0],
|
||||
[0.9, 0.0, 0.9, 0.9],
|
||||
[0.0, 0.9, 0.0, 0.0],
|
||||
[0.9, 0.9, 0.0, 0.0],
|
||||
]
|
||||
).float()
|
||||
y_true = torch.tensor(
|
||||
[[1, 0, 0, 1], [1, 1, 1, 1], [0, 1, 0, 0], [1, 1, 1, 0]]
|
||||
).float()
|
||||
return y_pred, y_true
|
||||
|
|
|
@ -4,7 +4,11 @@
|
|||
from PIL import Image
|
||||
from fastai.vision.data import ImageList
|
||||
|
||||
from utils_cv.classification.data import imagenet_labels, downsize_imagelist
|
||||
from utils_cv.classification.data import (
|
||||
imagenet_labels,
|
||||
downsize_imagelist,
|
||||
is_data_multilabel,
|
||||
)
|
||||
|
||||
|
||||
def test_imagenet_labels():
|
||||
|
@ -30,3 +34,12 @@ def test_downsize_imagelist(tiny_ic_data_path, tmp):
|
|||
assert len(im_list) == len(im_list2)
|
||||
for im_path in im_list2.items:
|
||||
assert min(Image.open(im_path).size) <= max_dim
|
||||
|
||||
|
||||
def test_is_data_multilabel(tiny_multilabel_ic_data_path, tiny_ic_data_path):
|
||||
"""
|
||||
Tests that multilabel classification datasets and traditional
|
||||
classification datasets are correctly identified
|
||||
"""
|
||||
assert is_data_multilabel(tiny_multilabel_ic_data_path)
|
||||
assert not is_data_multilabel(tiny_ic_data_path)
|
||||
|
|
|
@ -2,15 +2,64 @@
|
|||
# Licensed under the MIT License.
|
||||
|
||||
import pytest
|
||||
import numpy as np
|
||||
from torch import tensor
|
||||
from fastai.metrics import accuracy, error_rate
|
||||
from fastai.vision import cnn_learner, models
|
||||
from fastai.vision import ImageList, imagenet_stats
|
||||
from utils_cv.classification.model import (
|
||||
TrainMetricsRecorder,
|
||||
get_optimal_threshold,
|
||||
get_preds,
|
||||
hamming_accuracy,
|
||||
model_to_learner,
|
||||
TrainMetricsRecorder,
|
||||
zero_one_accuracy,
|
||||
)
|
||||
|
||||
|
||||
def test_hamming_accuracy_function(multilabel_result):
|
||||
""" Test the hamming loss evaluation metric function. """
|
||||
y_pred, y_true = multilabel_result
|
||||
assert hamming_accuracy(y_pred, y_true) == tensor(1.0 - 0.1875)
|
||||
assert hamming_accuracy(y_pred, y_true, sigmoid=True) == tensor(
|
||||
1.0 - 0.375
|
||||
)
|
||||
assert hamming_accuracy(y_pred, y_true, threshold=1.0) == tensor(
|
||||
1.0 - 0.625
|
||||
)
|
||||
|
||||
|
||||
def test_zero_one_accuracy_function(multilabel_result):
|
||||
""" Test the zero-one loss evaluation metric function. """
|
||||
y_pred, y_true = multilabel_result
|
||||
assert zero_one_accuracy(y_pred, y_true) == tensor(1.0 - 0.75)
|
||||
assert zero_one_accuracy(y_pred, y_true, sigmoid=True) == tensor(
|
||||
1.0 - 0.75
|
||||
)
|
||||
assert zero_one_accuracy(y_pred, y_true, threshold=1.0) == tensor(
|
||||
1.0 - 1.0
|
||||
)
|
||||
|
||||
|
||||
def test_get_optimal_threshold(multilabel_result):
|
||||
""" Test the get_optimal_threshold function. """
|
||||
y_pred, y_true = multilabel_result
|
||||
assert get_optimal_threshold(hamming_accuracy, y_pred, y_true) == 0.05
|
||||
assert (
|
||||
get_optimal_threshold(
|
||||
hamming_accuracy, y_pred, y_true, thresholds=np.linspace(0, 1, 11)
|
||||
)
|
||||
== 0.1
|
||||
)
|
||||
assert get_optimal_threshold(zero_one_accuracy, y_pred, y_true) == 0.05
|
||||
assert (
|
||||
get_optimal_threshold(
|
||||
zero_one_accuracy, y_pred, y_true, thresholds=np.linspace(0, 1, 11)
|
||||
)
|
||||
== 0.1
|
||||
)
|
||||
|
||||
|
||||
def test_model_to_learner():
|
||||
# Test if the function loads an ImageNet model (ResNet) trainer
|
||||
learn = model_to_learner(models.resnet34(pretrained=True))
|
||||
|
@ -68,3 +117,14 @@ def test_train_metrics_recorder(tiny_ic_data):
|
|||
assert len(cb.train_metrics) == epochs
|
||||
assert len(cb.train_metrics[0]) == 1 # we used 1 metrics
|
||||
assert len(cb.valid_metrics) == 0 # no validation
|
||||
|
||||
|
||||
def test_get_preds(tiny_ic_data):
|
||||
model = models.resnet18
|
||||
lr = 1e-4
|
||||
epochs = 1
|
||||
|
||||
learn = cnn_learner(tiny_ic_data, model)
|
||||
learn.fit(epochs, lr)
|
||||
pred_outs = get_preds(learn, tiny_ic_data.valid_dl)
|
||||
assert len(pred_outs[0]) == len(tiny_ic_data.valid_ds)
|
||||
|
|
|
@ -42,15 +42,31 @@ def test_01_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_02_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["02_training_accuracy_vs_speed"]
|
||||
def test_02_notebook_run(
|
||||
classification_notebooks, tiny_multilabel_ic_data_path
|
||||
):
|
||||
notebook_path = classification_notebooks["02_multilabel_classification"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
parameters=dict(
|
||||
PM_VERSION=pm.__version__, DATA_PATH=tiny_multilabel_ic_data_path
|
||||
),
|
||||
kernel_name=KERNEL_NAME,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_03_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["03_training_accuracy_vs_speed"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
parameters=dict(
|
||||
PM_VERSION=pm.__version__,
|
||||
DATA_PATH=tiny_ic_data_path,
|
||||
MODEL_TYPE="fast_inference", # options: ['fast_inference', 'high_accuracy', 'small_size']
|
||||
MULTILABEL=False,
|
||||
MODEL_TYPE="fast_inference", # options: ['fast_inference', 'high_performance', 'small_size']
|
||||
EPOCHS_HEAD=1,
|
||||
EPOCHS_BODY=1,
|
||||
),
|
||||
|
@ -88,6 +104,22 @@ def test_11_notebook_run(classification_notebooks, tiny_ic_data_path):
|
|||
),
|
||||
kernel_name=KERNEL_NAME,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
def test_12_notebook_run(classification_notebooks, tiny_ic_data_path):
|
||||
notebook_path = classification_notebooks["12_hard_negative_sampling"]
|
||||
pm.execute_notebook(
|
||||
notebook_path,
|
||||
OUTPUT_NOTEBOOK,
|
||||
parameters=dict(
|
||||
PM_VERSION=pm.__version__,
|
||||
DATA_PATH=tiny_ic_data_path,
|
||||
EPOCHS_HEAD=1,
|
||||
EPOCHS_BODY=1,
|
||||
),
|
||||
kernel_name=KERNEL_NAME,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.notebooks
|
||||
|
|
|
@ -7,7 +7,16 @@ from utils_cv.classification.plot import (
|
|||
plot_roc_curve,
|
||||
plot_precision_recall_curve,
|
||||
plot_pr_roc_curves,
|
||||
plot_thresholds,
|
||||
)
|
||||
from utils_cv.classification.model import hamming_accuracy, zero_one_accuracy
|
||||
|
||||
|
||||
def test_plot_threshold(multilabel_result):
|
||||
""" Test the plot_loss_threshold function """
|
||||
y_pred, y_true = multilabel_result
|
||||
plot_thresholds(hamming_accuracy, y_pred, y_true)
|
||||
plot_thresholds(zero_one_accuracy, y_pred, y_true)
|
||||
|
||||
|
||||
@pytest.fixture(scope="module")
|
||||
|
|
|
@ -3,6 +3,7 @@
|
|||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.extend([".", "..", "../..", "../../.."])
|
||||
|
||||
from utils_cv.common.data import root_path
|
||||
|
|
|
@ -2,6 +2,7 @@
|
|||
# Licensed under the MIT License.
|
||||
|
||||
import os
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from utils_cv.common.image import (
|
||||
|
|
|
@ -1,5 +1,40 @@
|
|||
from utils_cv.common.misc import set_random_seed
|
||||
import os
|
||||
from pathlib import Path
|
||||
from utils_cv.common.misc import copy_files, set_random_seed
|
||||
|
||||
|
||||
def test_set_random_seed():
|
||||
set_random_seed(1)
|
||||
|
||||
|
||||
def test_copy_files(tmp):
|
||||
parent = os.path.join(tmp, "parent")
|
||||
child = os.path.join(parent, "child")
|
||||
dst = os.path.join(tmp, "dst")
|
||||
os.makedirs(parent)
|
||||
os.makedirs(child)
|
||||
os.makedirs(dst)
|
||||
|
||||
file_in_child = Path(os.path.join(child, "file_in_child.txt"))
|
||||
file_in_child.touch()
|
||||
|
||||
copy_files(file_in_child, dst)
|
||||
assert os.path.isfile(os.path.join(dst, "file_in_child.txt"))
|
||||
|
||||
file_in_parent = Path(os.path.join(parent, "file_in_parent.txt"))
|
||||
file_in_parent.touch()
|
||||
|
||||
copy_files([file_in_child, file_in_parent], dst)
|
||||
assert os.path.isfile(os.path.join(dst, "file_in_parent.txt"))
|
||||
|
||||
# Check if the subdir is inferred
|
||||
copy_files([file_in_child, file_in_parent], dst, infer_subdir=True)
|
||||
dst_child = os.path.join(dst, "child")
|
||||
assert os.path.isdir(dst_child)
|
||||
assert os.path.isfile(os.path.join(dst_child, "file_in_child.txt"))
|
||||
assert not os.path.isfile(os.path.join(dst_child, "file_in_parent.txt"))
|
||||
|
||||
# Check if the original files are removed
|
||||
copy_files([file_in_child, file_in_parent], dst, remove=True)
|
||||
assert not os.path.isfile(file_in_parent)
|
||||
assert not os.path.isfile(file_in_child)
|
||||
|
|
|
@ -0,0 +1,23 @@
|
|||
from pathlib import Path
|
||||
import matplotlib.pyplot as plt
|
||||
from utils_cv.common.plot import line_graph, show_ims
|
||||
|
||||
|
||||
def test_line_graph():
|
||||
line_graph(
|
||||
values=([1,2,3], [3,2,1]),
|
||||
labels=("Train", "Valid"),
|
||||
x_guides=[0, 1],
|
||||
x_name="Epoch",
|
||||
y_name="Accuracy",
|
||||
)
|
||||
plt.close()
|
||||
|
||||
|
||||
def test_show_ims(tiny_ic_data_path):
|
||||
ims = [i for i in Path(tiny_ic_data_path).glob('*.*')]
|
||||
show_ims(ims)
|
||||
plt.close()
|
||||
|
||||
show_ims(ims, ['a'] * len(ims))
|
||||
plt.close()
|
|
@ -3,6 +3,7 @@
|
|||
|
||||
import os
|
||||
import requests
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
from typing import List, Union
|
||||
from urllib.parse import urljoin
|
||||
|
@ -16,25 +17,39 @@ class Urls:
|
|||
# for now hardcoding base url into Urls class
|
||||
base = "https://cvbp.blob.core.windows.net/public/datasets/image_classification/"
|
||||
|
||||
# Same link Keras is using
|
||||
# ImageNet labels Keras is using
|
||||
imagenet_labels_json = "https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json"
|
||||
|
||||
# datasets
|
||||
# traditional datasets
|
||||
fridge_objects_path = urljoin(base, "fridgeObjects.zip")
|
||||
fridge_objects_watermark_path = urljoin(base, "fridgeObjectsWatermark.zip")
|
||||
fridge_objects_tiny_path = urljoin(base, "fridgeObjectsTiny.zip")
|
||||
fridge_objects_watermark_tiny_path = urljoin(
|
||||
base, "fridgeObjectsWatermarkTiny.zip"
|
||||
)
|
||||
fridge_objects_negatives_path = urljoin(base, "fridgeObjectsNegative.zip")
|
||||
|
||||
# multilabel datasets
|
||||
multilabel_fridge_objects_path = urljoin(
|
||||
base, "multilabelFridgeObjects.zip"
|
||||
)
|
||||
multilabel_fridge_objects_watermark_path = urljoin(
|
||||
base, "multilabelFridgeObjectsWatermark.zip"
|
||||
)
|
||||
multilabel_fridge_objects_tiny_path = urljoin(
|
||||
base, "multilabelFridgeObjectsTiny.zip"
|
||||
)
|
||||
multilabel_fridge_objects_watermark_tiny_path = urljoin(
|
||||
base, "multilabelFridgeObjectsWatermarkTiny.zip"
|
||||
)
|
||||
|
||||
# TODO remove
|
||||
food_101_subset_path = urljoin(base, "food101Subset.zip")
|
||||
fashion_texture_path = urljoin(base, "fashionTexture.zip")
|
||||
flickr_logos_32_subset_path = urljoin(base, "flickrLogos32Subset.zip")
|
||||
lettuce_path = urljoin(base, "lettuce.zip")
|
||||
recycle_path = urljoin(base, "recycle_v3.zip")
|
||||
|
||||
|
||||
@classmethod
|
||||
def all(cls) -> List[str]:
|
||||
return [v for k, v in cls.__dict__.items() if k.endswith("_path")]
|
||||
|
@ -74,7 +89,7 @@ def downsize_imagelist(
|
|||
# Loop over all images
|
||||
for src_path in tqdm(im_list.items):
|
||||
# Load and optionally down-size image
|
||||
im = Image.open(src_path).convert('RGB')
|
||||
im = Image.open(src_path).convert("RGB")
|
||||
scale = float(dim) / min(im.size)
|
||||
if scale < 1.0:
|
||||
new_size = [int(round(f * scale)) for f in im.size]
|
||||
|
@ -88,3 +103,56 @@ def downsize_imagelist(
|
|||
), "Image source and destination path should not be the same: {src_rel_path}"
|
||||
os.makedirs(os.path.dirname(dst_path), exist_ok=True)
|
||||
im.save(dst_path)
|
||||
|
||||
|
||||
class LabelCsvNotFound(Exception):
|
||||
""" Exception if no csv named 'label.csv' is found in the path. """
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class LabelColumnNotFound(Exception):
|
||||
""" Exception if label column not found in the CSV file. """
|
||||
|
||||
pass
|
||||
|
||||
|
||||
def is_data_multilabel(path: Union[Path, str]) -> bool:
|
||||
""" Checks if dataset is a multilabel dataset.
|
||||
|
||||
A dataset is considered multilabel if it meets the following conditions:
|
||||
- a csv titled 'labels.csv' is located in the path
|
||||
- the column of the labels is titled 'labels'
|
||||
- the labels are delimited by spaces or commas
|
||||
- there exists at least one image that maps to 2 or more labels
|
||||
|
||||
Args:
|
||||
path: path to the dataset
|
||||
|
||||
Raises:
|
||||
MultipleCsvsFound if multiple csv files are present
|
||||
|
||||
Returns:
|
||||
Whether or not the dataset is multilabel.
|
||||
"""
|
||||
files = Path(path).glob("*.csv")
|
||||
|
||||
if len([f for f in files]) == 0:
|
||||
return False
|
||||
|
||||
csv_file_path = Path(path) / "labels.csv"
|
||||
|
||||
if not csv_file_path.is_file():
|
||||
raise LabelCsvNotFound
|
||||
|
||||
df = pd.read_csv(csv_file_path)
|
||||
|
||||
if "labels" not in df.columns:
|
||||
raise LabelColumnNotFound
|
||||
|
||||
labels = df["labels"].str.split(" ", n=1, expand=True)
|
||||
|
||||
if len(labels.columns) <= 1:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
|
|
@ -2,14 +2,14 @@
|
|||
# Licensed under the MIT License.
|
||||
|
||||
from time import time
|
||||
from typing import Any, List
|
||||
from typing import Any, Callable, List, Optional
|
||||
|
||||
from fastai.basic_train import LearnerCallback
|
||||
from fastai.core import PBar
|
||||
import fastai.basic_train
|
||||
from fastai.basic_train import _loss_func2activ, LearnerCallback
|
||||
from fastai.torch_core import TensorOrNumList
|
||||
from fastai.vision import (
|
||||
Learner, nn,
|
||||
ImageDataBunch, imagenet_stats,
|
||||
CallbackHandler, DataLoader, Learner, nn,
|
||||
ImageDataBunch, imagenet_stats, PBar,
|
||||
)
|
||||
from fastprogress.fastprogress import format_time
|
||||
from IPython.display import display
|
||||
|
@ -17,6 +17,7 @@ import matplotlib.pyplot as plt
|
|||
from matplotlib.ticker import MaxNLocator
|
||||
import torch
|
||||
from torch import Tensor
|
||||
import numpy as np
|
||||
|
||||
from utils_cv.classification.data import imagenet_labels
|
||||
|
||||
|
@ -24,6 +25,98 @@ from utils_cv.classification.data import imagenet_labels
|
|||
IMAGENET_IM_SIZE = 224
|
||||
|
||||
|
||||
def hamming_accuracy(
|
||||
y_pred: Tensor,
|
||||
y_true: Tensor,
|
||||
threshold: float = 0.2,
|
||||
sigmoid: bool = False,
|
||||
) -> Tensor:
|
||||
""" Callback for using hamming accuracy as a evaluation metric.
|
||||
|
||||
Hamming accuracy is one minus the fraction of wrong labels to the total
|
||||
number of labels.
|
||||
|
||||
Args:
|
||||
y_pred: prediction output
|
||||
y_true: true class labels
|
||||
threshold: the threshold to consider a positive classification
|
||||
sigmoid: whether to apply the sigmoid activation
|
||||
|
||||
Returns:
|
||||
The hamming accuracy function as a tensor of dtype float
|
||||
"""
|
||||
if sigmoid:
|
||||
y_pred = y_pred.sigmoid()
|
||||
if threshold:
|
||||
y_pred = y_pred > threshold
|
||||
return 1 - (
|
||||
(y_pred.float() != y_true).sum() / torch.ones(y_pred.shape).sum()
|
||||
)
|
||||
|
||||
|
||||
def zero_one_accuracy(
|
||||
y_pred: Tensor,
|
||||
y_true: Tensor,
|
||||
threshold: float = 0.2,
|
||||
sigmoid: bool = False,
|
||||
) -> Tensor:
|
||||
""" Callback for using zero-one accuracy as a evaluation metric.
|
||||
|
||||
The zero-one accuracy will classify an entire set of labels for a given
|
||||
sample incorrect if it does not entirely match the true set of labels.
|
||||
|
||||
Args:
|
||||
y_pred: prediction output
|
||||
y_true: true class labels
|
||||
threshold: the threshold to consider a positive classification
|
||||
sigmoid: whether to apply the sigmoid activation
|
||||
|
||||
Returns:
|
||||
The zero-one accuracy function as a tensor with dtype float
|
||||
"""
|
||||
if sigmoid:
|
||||
y_pred = y_pred.sigmoid()
|
||||
if threshold:
|
||||
y_pred = y_pred > threshold
|
||||
|
||||
zero_one_preds = (y_pred.float() != y_true).sum(dim=1)
|
||||
zero_one_preds[zero_one_preds >= 1] = 1
|
||||
num_labels = y_pred.shape[-1]
|
||||
return 1 - (
|
||||
zero_one_preds.sum().float() / len(y_pred.reshape(-1, num_labels))
|
||||
)
|
||||
|
||||
|
||||
def get_optimal_threshold(
|
||||
metric_function: Callable[[Tensor, Tensor, float], Tensor],
|
||||
y_pred: Tensor,
|
||||
y_true: Tensor,
|
||||
thresholds: List[float] = np.linspace(0, 1, 21),
|
||||
) -> float:
|
||||
""" Gets the best threshold to use for the provided metric function.
|
||||
|
||||
This method samples the metric function at evenly distributed threshold
|
||||
intervals to find the best threshold.
|
||||
|
||||
Args:
|
||||
metric_function: The metric function
|
||||
y_pred: predicted probabilities.
|
||||
y_true: True class indices.
|
||||
samples: The number of samples.
|
||||
|
||||
Returns:
|
||||
The threshold that optimizes the metric function.
|
||||
"""
|
||||
optimal_threshold = None
|
||||
metric_max = -np.inf
|
||||
for threshold in thresholds:
|
||||
metric = metric_function(y_pred, y_true, threshold=threshold)
|
||||
if metric > metric_max:
|
||||
metric_max = metric
|
||||
optimal_threshold = threshold
|
||||
return optimal_threshold
|
||||
|
||||
|
||||
def model_to_learner(
|
||||
model: nn.Module, im_size: int = IMAGENET_IM_SIZE
|
||||
) -> Learner:
|
||||
|
@ -48,10 +141,33 @@ def model_to_learner(
|
|||
return Learner(empty_data, model)
|
||||
|
||||
|
||||
def get_preds(
|
||||
learn: Learner, dl: DataLoader, with_loss: bool = False, n_batch: Optional[int] = None, pbar: Optional[PBar] = None
|
||||
) -> List[Tensor]:
|
||||
"""Return predictions and targets on `dl` dataset.
|
||||
This function is the same as fastai's Learner.get_preds except this allows an external DataLoader.
|
||||
For more details about Learner.get_preds, see:
|
||||
https://github.com/fastai/fastai/blob/master/fastai/basic_train.py
|
||||
|
||||
Args:
|
||||
learn: Learner object that will be used for prediction
|
||||
dl: DataLoader the model will use to load samples
|
||||
with_loss: If True, it will also return the loss on each prediction
|
||||
n_batch: Number of batches to predict. If not specified, it will run the predictions for n batches
|
||||
where n = sample size // BATCH_SIZE
|
||||
pbar: ProgressBar object
|
||||
"""
|
||||
lf = learn.loss_func if with_loss else None
|
||||
return fastai.basic_train.get_preds(
|
||||
learn.model, dl, cb_handler=CallbackHandler(learn.callbacks),
|
||||
activ=_loss_func2activ(learn.loss_func), loss_func=lf, n_batch=n_batch, pbar=pbar
|
||||
)
|
||||
|
||||
|
||||
class TrainMetricsRecorder(LearnerCallback):
|
||||
_order = -20 # Needs to run before the recorder
|
||||
|
||||
def __init__(self, learn, n_batch: int = None, show_graph: bool = False):
|
||||
def __init__(self, learn: Learner, n_batch: int = None, show_graph: bool = False):
|
||||
"""Fastai Train hook to evaluate metrics on train and validation set for every epoch.
|
||||
|
||||
This class works with the metrics functions whose signature is fn(input:Tensor, targs:Tensor),
|
||||
|
@ -70,7 +186,7 @@ class TrainMetricsRecorder(LearnerCallback):
|
|||
|
||||
Examples:
|
||||
>>> learn = cnn_learner(data, model, metrics=[accuracy])
|
||||
>>> train_metrics_cb = TrainMetricsRecorder(n_batch=1)
|
||||
>>> train_metrics_cb = TrainMetricsRecorder(learn, n_batch=1)
|
||||
>>> learn.callbacks.append(train_metrics_cb)
|
||||
>>> learn.fit(epochs=10, lr=0.001)
|
||||
>>> train_metrics_cb.plot()
|
||||
|
@ -100,28 +216,33 @@ class TrainMetricsRecorder(LearnerCallback):
|
|||
self, pbar: PBar, metrics: List, n_epochs: int, **kwargs: Any
|
||||
):
|
||||
self.has_metrics = metrics and len(metrics) > 0
|
||||
self.has_val = hasattr(self.learn.data, 'valid_ds')
|
||||
self.has_val = hasattr(self.learn.data, "valid_ds")
|
||||
|
||||
# Result table and graph variables
|
||||
self.learn.recorder.silent = (
|
||||
True
|
||||
) # Mute recorder. This callback will printout results instead.
|
||||
self.pbar = pbar
|
||||
self.names = ['epoch', 'train_loss']
|
||||
self.names = ["epoch", "train_loss"]
|
||||
if self.has_val:
|
||||
self.names.append('valid_loss')
|
||||
self.names.append("valid_loss")
|
||||
# Add metrics names
|
||||
self.metrics_names = [m_fn.__name__ for m_fn in metrics]
|
||||
for m in self.metrics_names:
|
||||
self.names.append('train_' + m)
|
||||
self.names.append("train_" + m)
|
||||
if self.has_val:
|
||||
self.names.append('valid_' + m)
|
||||
self.names.append('time')
|
||||
self.names.append("valid_" + m)
|
||||
self.names.append("time")
|
||||
self.pbar.write(self.names, table=True)
|
||||
|
||||
self.n_epochs = n_epochs
|
||||
self.valid_metrics = []
|
||||
self.train_metrics = []
|
||||
|
||||
# Reset graph
|
||||
self._fig = None
|
||||
self._axes = None
|
||||
self._display = None
|
||||
|
||||
def on_epoch_begin(self, **kwargs: Any):
|
||||
self.start_epoch = time()
|
||||
|
@ -188,30 +309,26 @@ class TrainMetricsRecorder(LearnerCallback):
|
|||
str_stats = []
|
||||
for name, stat in zip(self.names, stats):
|
||||
str_stats.append(
|
||||
'#na#'
|
||||
"#na#"
|
||||
if stat is None
|
||||
else str(stat)
|
||||
if isinstance(stat, int)
|
||||
else f'{stat:.6f}'
|
||||
else f"{stat:.6f}"
|
||||
)
|
||||
str_stats.append(format_time(time() - self.start_epoch))
|
||||
self.pbar.write(str_stats, table=True)
|
||||
|
||||
|
||||
def _plot(self, update=False):
|
||||
if not self._fig:
|
||||
# init graph
|
||||
if not hasattr(self, '_fig'):
|
||||
self._fig, self._axes = plt.subplots(
|
||||
len(self.train_metrics[0]),
|
||||
1,
|
||||
figsize=(6, 4 * len(self.train_metrics[0])),
|
||||
)
|
||||
self._axes = (
|
||||
self._axes.flatten()
|
||||
if len(self.train_metrics[0]) > 1
|
||||
else [self._axes]
|
||||
)
|
||||
self._axes = (self._axes.flatten() if len(self.train_metrics[0]) > 1 else [self._axes])
|
||||
plt.close(self._fig)
|
||||
|
||||
|
||||
# Plot each metrics as a subplot
|
||||
for i, ax in enumerate(self._axes):
|
||||
ax.clear()
|
||||
|
@ -222,27 +339,29 @@ class TrainMetricsRecorder(LearnerCallback):
|
|||
ax.plot(x_axis, tr_m, label="Train")
|
||||
|
||||
# Plot validation set results
|
||||
maybe_y_bounds = [-0.05, 1.05, min(Tensor(tr_m)), max(Tensor(tr_m))]
|
||||
maybe_y_bounds = [
|
||||
-0.05,
|
||||
1.05,
|
||||
min(Tensor(tr_m)),
|
||||
max(Tensor(tr_m)),
|
||||
]
|
||||
if len(self.valid_metrics) > 0:
|
||||
vl_m = [met[i] for met in self.valid_metrics]
|
||||
ax.plot(x_axis, vl_m, label="Validation")
|
||||
maybe_y_bounds.extend([min(Tensor(vl_m)), max(Tensor(vl_m))])
|
||||
|
||||
x_bounds = (-0.05, self.n_epochs - 0.95)
|
||||
y_bounds = (
|
||||
min(maybe_y_bounds) - 0.05,
|
||||
max(maybe_y_bounds) + 0.05,
|
||||
)
|
||||
y_bounds = (min(maybe_y_bounds) - 0.05, max(maybe_y_bounds) + 0.05)
|
||||
ax.set_xlim(x_bounds)
|
||||
ax.set_ylim(y_bounds)
|
||||
|
||||
ax.set_ylabel(self.metrics_names[i])
|
||||
ax.set_xlabel("Epochs")
|
||||
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
|
||||
ax.legend(loc='upper right')
|
||||
ax.legend(loc="upper right")
|
||||
|
||||
if update:
|
||||
if not hasattr(self, '_display'):
|
||||
if not self._display:
|
||||
self._display = display(self._fig, display_id=True)
|
||||
else:
|
||||
self._display.update(self._fig)
|
||||
|
|
|
@ -14,9 +14,13 @@ from typing import Any, Dict, List, Tuple, Union
|
|||
from fastai.callbacks import EarlyStoppingCallback
|
||||
from fastai.metrics import accuracy
|
||||
from fastai.vision import (
|
||||
cnn_learner, get_transforms,
|
||||
ImageDataBunch, ImageList, imagenet_stats,
|
||||
Learner, models,
|
||||
cnn_learner,
|
||||
get_transforms,
|
||||
ImageDataBunch,
|
||||
ImageList,
|
||||
imagenet_stats,
|
||||
Learner,
|
||||
models,
|
||||
)
|
||||
from matplotlib.axes import Axes
|
||||
from matplotlib.text import Annotation
|
||||
|
|
|
@ -5,6 +5,7 @@
|
|||
Helper module for drawing plots
|
||||
"""
|
||||
import matplotlib.pyplot as plt
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from sklearn.metrics import (
|
||||
precision_recall_curve,
|
||||
|
@ -13,6 +14,40 @@ from sklearn.metrics import (
|
|||
auc,
|
||||
)
|
||||
from sklearn.preprocessing import label_binarize
|
||||
from torch import Tensor
|
||||
from typing import Callable
|
||||
|
||||
|
||||
def plot_thresholds(
|
||||
metric_function: Callable[[Tensor, Tensor, float], Tensor],
|
||||
y_pred: Tensor,
|
||||
y_true: Tensor,
|
||||
samples: int = 21,
|
||||
figsize: tuple = (12, 6),
|
||||
) -> None:
|
||||
""" Plot the evaluation metric of the model at different thresholds.
|
||||
|
||||
This function will plot the metric for every 0.05 increments of the
|
||||
threshold. This means that there will be a total of 20 increments.
|
||||
|
||||
Args:
|
||||
metric_function: The metric function
|
||||
y_pred: predicted probabilities.
|
||||
y_true: True class indices.
|
||||
figsize: Figure size (w, h)
|
||||
"""
|
||||
metric_name = metric_function.__name__
|
||||
metrics = []
|
||||
for threshold in np.linspace(0, 1, samples):
|
||||
metric = metric_function(y_pred, y_true, threshold=threshold)
|
||||
metrics.append(metric)
|
||||
|
||||
ax = pd.DataFrame(metrics).plot(figsize=figsize)
|
||||
ax.set_title(f"{metric_name} at different thresholds")
|
||||
ax.set_ylabel(f"{metric_name}")
|
||||
ax.set_xlabel("threshold")
|
||||
ax.set_xticks(np.linspace(0, 20, 11))
|
||||
ax.set_xticklabels(np.around(np.linspace(0, 1, 11), decimals=2))
|
||||
|
||||
|
||||
def plot_pr_roc_curves(
|
||||
|
|
|
@ -3,10 +3,9 @@
|
|||
|
||||
from base64 import b64encode
|
||||
from pathlib import Path
|
||||
from typing import Union, Tuple
|
||||
from typing import List, Tuple, Union
|
||||
|
||||
import numpy as np
|
||||
|
||||
from PIL import Image
|
||||
|
||||
|
||||
|
|
|
@ -1,12 +1,16 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import os
|
||||
from pathlib import PosixPath
|
||||
import random
|
||||
import shutil
|
||||
from typing import List, Union
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
def set_random_seed(s):
|
||||
def set_random_seed(s: int):
|
||||
"""Set random seed
|
||||
"""
|
||||
np.random.seed(s)
|
||||
|
@ -14,6 +18,7 @@ def set_random_seed(s):
|
|||
|
||||
try:
|
||||
import torch
|
||||
|
||||
torch.manual_seed(s)
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.manual_seed(s)
|
||||
|
@ -22,3 +27,28 @@ def set_random_seed(s):
|
|||
torch.backends.cudnn.benchmark = False
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
|
||||
def copy_files(fpaths: Union[str, List[str]], dst: str, infer_subdir: bool = False, remove: bool = False):
|
||||
"""Copy list of files into destination
|
||||
|
||||
Args:
|
||||
fpaths: File path to copy
|
||||
dst: Destination directory
|
||||
infer_subdir: If True, try to infer directory structure of the files and copy.
|
||||
Otherwise, just copy the files to dst
|
||||
remove: Remove copied files from the original directory
|
||||
"""
|
||||
if isinstance(fpaths, (str, PosixPath)):
|
||||
fpaths = [fpaths]
|
||||
|
||||
for fpath in fpaths:
|
||||
if infer_subdir:
|
||||
dst = os.path.join(dst, os.path.basename(os.path.dirname(fpath)))
|
||||
|
||||
if not os.path.isdir(dst):
|
||||
os.makedirs(dst)
|
||||
shutil.copy(fpath, dst)
|
||||
|
||||
if remove:
|
||||
os.remove(fpath)
|
||||
|
|
|
@ -0,0 +1,79 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
|
||||
import math
|
||||
from pathlib import Path
|
||||
from typing import List, Union
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import matplotlib.image as mpimg
|
||||
import numpy as np
|
||||
|
||||
|
||||
def line_graph(
|
||||
values: Union[List[List[float]], List[float]],
|
||||
labels: Union[List[str], str],
|
||||
x_guides: List[int],
|
||||
x_name: str,
|
||||
y_name: str,
|
||||
legend_loc: str="lower right",
|
||||
):
|
||||
"""Plot line graph(s).
|
||||
|
||||
Args:
|
||||
values: List of graphs or a graph to plot
|
||||
labels: List of labels or a label for graph.
|
||||
If labels is a string, this function assumes the values is a single graph.
|
||||
x_guides: List of guidelines (a vertical dotted line)
|
||||
x_name: x axis label
|
||||
y_name: y axis label
|
||||
legend_loc: legend location
|
||||
"""
|
||||
if isinstance(labels, str):
|
||||
plt.plot(range(values), values, label=labels, lw=1)
|
||||
else:
|
||||
assert len(values) == len(labels)
|
||||
for i, v in enumerate(values):
|
||||
plt.plot(range(len(v)), v, label=labels[i], lw=1)
|
||||
|
||||
for x in x_guides:
|
||||
plt.axvline(x=x, color="gray", lw=1, linestyle="--")
|
||||
|
||||
plt.xlabel(x_name)
|
||||
plt.ylabel(y_name)
|
||||
plt.legend(loc=legend_loc)
|
||||
|
||||
|
||||
def show_ims(
|
||||
im_paths: Union[str, List[str]],
|
||||
labels: Union[str, List[str]]=None,
|
||||
size: int=3,
|
||||
rows: int=1,
|
||||
):
|
||||
"""Show image files
|
||||
Args:
|
||||
im_paths (str or List[str]): Image filepaths
|
||||
labels (str or List[str]): Image labels. If None, show image file name.
|
||||
size (int): MatplotLib plot size.
|
||||
rows (int): rows of the images
|
||||
"""
|
||||
if isinstance(im_paths, (str, Path)):
|
||||
if labels is not None and isinstance(labels, str):
|
||||
labels = [labels]
|
||||
ims = [mpimg.imread(im_paths)]
|
||||
im_paths = [im_paths]
|
||||
else:
|
||||
ims = [mpimg.imread(im_path) for im_path in im_paths]
|
||||
|
||||
cols = math.ceil(len(ims)/rows)
|
||||
_, axes = plt.subplots(rows, cols, figsize=(size*cols, size*rows))
|
||||
axes = np.array(axes).reshape(-1)
|
||||
for ax in axes:
|
||||
ax.set_axis_off()
|
||||
|
||||
for i, (im_path, im) in enumerate(zip(im_paths, ims)):
|
||||
if labels is None:
|
||||
axes[i].set_title(Path(im_path).stem)
|
||||
else:
|
||||
axes[i].set_title(labels[i])
|
||||
axes[i].imshow(im)
|
|
@ -1,6 +1,5 @@
|
|||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License.
|
||||
import os
|
||||
import copy
|
||||
from fastai.data_block import LabelList
|
||||
from ipywidgets import widgets, Layout, IntSlider
|
||||
|
@ -9,15 +8,21 @@ import numpy as np
|
|||
|
||||
def _list_sort(list1D, reverse=False, comparison_fct=lambda x: x):
|
||||
indices = list(range(len(list1D)))
|
||||
tmp = sorted(zip(list1D,indices), key=comparison_fct, reverse=reverse)
|
||||
tmp = sorted(zip(list1D, indices), key=comparison_fct, reverse=reverse)
|
||||
list1D_sorted, sort_order = list(map(list, list(zip(*tmp))))
|
||||
return (list1D_sorted, sort_order)
|
||||
return (list1D_sorted, sort_order)
|
||||
|
||||
|
||||
class DistanceWidget(object):
|
||||
IM_WIDTH = 500 # pixels
|
||||
|
||||
def __init__(self, dataset: LabelList, distances: np.ndarray, query_im_path = None, sort = True):
|
||||
def __init__(
|
||||
self,
|
||||
dataset: LabelList,
|
||||
distances: np.ndarray,
|
||||
query_im_path=None,
|
||||
sort=True,
|
||||
):
|
||||
"""Helper class to draw and update Image classification results widgets.
|
||||
|
||||
Args:
|
||||
|
@ -29,7 +34,9 @@ class DistanceWidget(object):
|
|||
|
||||
if sort:
|
||||
distances, sort_order = _list_sort(distances, reverse=False)
|
||||
dataset = copy.deepcopy(dataset) # create copy to not modify the input
|
||||
dataset = copy.deepcopy(
|
||||
dataset
|
||||
) # create copy to not modify the input
|
||||
dataset.x.items = [dataset.x.items[i] for i in sort_order]
|
||||
dataset.y.items = [dataset.y.items[i] for i in sort_order]
|
||||
|
||||
|
@ -37,7 +44,7 @@ class DistanceWidget(object):
|
|||
self.distances = distances
|
||||
self.query_im_path = query_im_path
|
||||
self.vis_image_index = 0
|
||||
|
||||
|
||||
self._create_ui()
|
||||
|
||||
def show(self):
|
||||
|
@ -48,7 +55,9 @@ class DistanceWidget(object):
|
|||
|
||||
self.w_image_header.value = f"Image index: {self.vis_image_index}"
|
||||
self.w_img.value = im._repr_png_()
|
||||
self.w_distance.value = "{:.2f}".format(self.distances[self.vis_image_index])
|
||||
self.w_distance.value = "{:.2f}".format(
|
||||
self.distances[self.vis_image_index]
|
||||
)
|
||||
self.w_filename.value = str(
|
||||
self.dataset.items[self.vis_image_index].name
|
||||
)
|
||||
|
@ -129,22 +138,22 @@ class DistanceWidget(object):
|
|||
self.w_distance = widgets.Text(
|
||||
value="", description="Distance:", layout=Layout(width="200px")
|
||||
)
|
||||
info_widgets = [widgets.HTML(value="Image:"),
|
||||
self.w_filename,
|
||||
self.w_path,
|
||||
self.w_distance]
|
||||
info_widgets = [
|
||||
widgets.HTML(value="Image:"),
|
||||
self.w_filename,
|
||||
self.w_path,
|
||||
self.w_distance,
|
||||
]
|
||||
|
||||
# Show query image if path is provided
|
||||
# Show query image if path is provided
|
||||
if self.query_im_path:
|
||||
info_widgets.append(widgets.HTML(value="Query Image:"))
|
||||
w_query_img = widgets.Image(layout=Layout(width="200px"))
|
||||
w_query_img.value = open(self.query_im_path, "rb").read()
|
||||
info_widgets.append(w_query_img)
|
||||
|
||||
|
||||
# Combine UIs into tab widget
|
||||
w_info = widgets.VBox(
|
||||
children=info_widgets
|
||||
)
|
||||
w_info = widgets.VBox(children=info_widgets)
|
||||
w_info.layout.padding = "20px"
|
||||
self.ui = widgets.Tab(
|
||||
children=[
|
||||
|
|
Загрузка…
Ссылка в новой задаче