This commit is contained in:
Santhosh Pillai 2020-02-05 10:12:15 -08:00
Родитель 827f6ecc30
Коммит 5fb7dfd34f
9 изменённых файлов: 22 добавлений и 22 удалений

Просмотреть файл

@ -1,17 +1,17 @@
# Azure Machine Learning Pipelines Demo
An [Azure Machine Learning pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. Each step is an executable module of code which can have inputs and produce outputs (which can then be consumed by other steps as inputs).
An [Azure Machine Learning pipeline](https://aka.ms/pl-concept) is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. Each step is an executable module of code which can have inputs and produce outputs (which can then be consumed by other steps as inputs).
![](images/aml-pipeline-flow.png)
There are multiple advantages to using pipelines:
- It allows data scientists to seperate tasks into non-overlapping components, enabling collaboration and development in parallel.
- It allows data scientists to separate tasks into non-overlapping components, enabling collaboration and development in parallel.
- It allows teams and organizations to create reusable templates for common tasks.
- It allows more optimal usage of compute resources (eg. data preparation steps can be run on a CPU, while model training steps run on a GPU).
- If enabled, it allows the cached output of a step to be reused in cases where re-running it would not give a different result (eg. a step for preprocessing data would not run again if the inputs and source code remains the same - it would just use the same output from the previous run).
- It allows more optimal usage of compute resources (e.g., data preparation steps can be run on a CPU, while model training steps run on a GPU).
- If enabled, it allows the cached output of a step to be reused in cases where re-running it would not give a different result (e.g., a step for preprocessing data would not run again if the inputs and source code remains the same - it would just use the same output from the previous run).
The following repository shows an exampe of how you can use the Azure Machine Learning SDK to create a pipeline.
The following repository shows an example of how you can use the Azure Machine Learning SDK to create a pipeline.
## Object Recognition Problem
@ -23,7 +23,7 @@ In order to show the example, we will be training a model that is able to classi
`Output`: Reference to directory containing the raw data.
This step will leverage [Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the inputed datastore reference.
This step will leverage [Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the input datastore reference.
#### Step 2: Preprocess Data
@ -108,4 +108,4 @@ If you want to use a custom image:
```
python test-endpoint.py --image_url <URL OF IMAGE>
```
```

Двоичные данные
images/pipeline-screenshot.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 103 KiB

После

Ширина:  |  Высота:  |  Размер: 83 KiB

Просмотреть файл

@ -36,7 +36,7 @@ accuracy_file = args.accuracy_file
scoring_url = args.scoring_url
# Define model and service names
service_name = 'object-recognition-service'
service_name = 'object-reco-service'
model_name = 'object-recognition-pipeline'
# Get run context

Просмотреть файл

@ -32,6 +32,7 @@ def deploy_step(model_dir, accuracy_file, test_dir, compute_target):
outputs_map = { 'scoring_url': scoring_url }
step = PythonScriptStep(
name="Deploy Model",
script_name='deploy.py',
arguments=[
'--model_dir', model_dir,

Просмотреть файл

@ -40,6 +40,7 @@ def evaluate_step(model_dir, test_dir, compute_target):
use_gpu=True)
step = EstimatorStep(
name="Evaluate Model",
estimator=estimator,
estimator_entry_script_arguments=[
'--test_dir', test_dir,

Просмотреть файл

@ -4,15 +4,15 @@ from azureml.core.runconfig import RunConfiguration
from azureml.pipeline.core import PipelineData
from azureml.pipeline.core import PipelineParameter
def data_ingestion_step(datastore_reference, compute_target):
def data_ingestion_step(datastore, compute_target):
'''
This step will leverage Azure Cognitive Services to search the web for images
to create a dataset. This replicates the real-world scenario of data being
ingested from a constantly changing source. The same 10 classes in the CIFAR-10 dataset
will be used (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).
:param datastore_reference: The reference to the datastore that will be used
:type datastore_reference: DataReference
:param datastore: The datastore that will be used
:type datastore: Datastore
:param compute_target: The compute target to run the step on
:type compute_target: ComputeTarget
@ -29,7 +29,7 @@ def data_ingestion_step(datastore_reference, compute_target):
raw_data_dir = PipelineData(
name='raw_data_dir',
pipeline_output_name='raw_data_dir',
datastore=datastore_reference.datastore,
datastore=datastore,
output_mode='mount',
is_directory=True)
@ -37,9 +37,9 @@ def data_ingestion_step(datastore_reference, compute_target):
outputs_map = { 'raw_data_dir': raw_data_dir }
step = PythonScriptStep(
name="Data Ingestion",
script_name='data_ingestion.py',
arguments=['--output_dir', raw_data_dir, '--num_images', num_images],
inputs=[datastore_reference],
outputs=outputs,
compute_target=compute_target,
source_directory=os.path.dirname(os.path.abspath(__file__)),

Просмотреть файл

@ -54,6 +54,7 @@ def data_preprocess_step(raw_data_dir, compute_target):
}
step = PythonScriptStep(
name="Preprocess Data",
script_name='data_preprocess.py',
arguments=[
'--raw_data_dir', raw_data_dir,

Просмотреть файл

@ -46,6 +46,7 @@ def train_step(train_dir, valid_dir, compute_target):
use_gpu=True)
step = EstimatorStep(
name="Train Model",
estimator=estimator,
estimator_entry_script_arguments=[
'--train_dir', train_dir,

Просмотреть файл

@ -1,7 +1,6 @@
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
from azureml.data.data_reference import DataReference
from modules.ingestion.data_ingestion_step import data_ingestion_step
from modules.preprocess.data_preprocess_step import data_preprocess_step
from modules.train.train_step import train_step
@ -16,8 +15,8 @@ datastore = workspace.get_default_datastore()
# Create CPU compute target
print('Creating CPU compute target ...')
cpu_cluster_name = 'ds3cluster'
cpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
cpu_cluster_name = 'cpucluster'
cpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
idle_seconds_before_scaledown=1200,
min_nodes=0,
max_nodes=2)
@ -26,17 +25,14 @@ cpu_compute_target.wait_for_completion(show_output=True)
# Create GPU compute target
print('Creating GPU compute target ...')
gpu_cluster_name = 'k80cluster'
gpu_compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6',
gpu_cluster_name = 'gpucluster'
gpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
idle_seconds_before_scaledown=1200,
min_nodes=0,
max_nodes=2)
gpu_compute_target = ComputeTarget.create(workspace, gpu_cluster_name, gpu_compute_config)
gpu_compute_target.wait_for_completion(show_output=True)
# Get datastore reference
datastore = DataReference(datastore, mode='mount')
# Step 1: Data ingestion
data_ingestion_step, data_ingestion_outputs = data_ingestion_step(datastore, cpu_compute_target)
@ -63,4 +59,4 @@ pipeline_parameters = {
'momentum': 0.9
}
pipeline = Pipeline(workspace=workspace, steps=[data_ingestion_step, data_preprocess_step, train_step, evaluate_step, deploy_step])
pipeline_run = Experiment(workspace, 'object-recognition-pipeline').submit(pipeline, pipeline_parameters=pipeline_parameters)
pipeline_run = Experiment(workspace, 'Object-Recognition-Demo').submit(pipeline, pipeline_parameters=pipeline_parameters)