Updating script files
This commit is contained in:
Родитель
827f6ecc30
Коммит
5fb7dfd34f
14
README.md
14
README.md
|
@ -1,17 +1,17 @@
|
|||
# Azure Machine Learning Pipelines Demo
|
||||
|
||||
An [Azure Machine Learning pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. Each step is an executable module of code which can have inputs and produce outputs (which can then be consumed by other steps as inputs).
|
||||
An [Azure Machine Learning pipeline](https://aka.ms/pl-concept) is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. Each step is an executable module of code which can have inputs and produce outputs (which can then be consumed by other steps as inputs).
|
||||
|
||||
![](images/aml-pipeline-flow.png)
|
||||
|
||||
There are multiple advantages to using pipelines:
|
||||
|
||||
- It allows data scientists to seperate tasks into non-overlapping components, enabling collaboration and development in parallel.
|
||||
- It allows data scientists to separate tasks into non-overlapping components, enabling collaboration and development in parallel.
|
||||
- It allows teams and organizations to create reusable templates for common tasks.
|
||||
- It allows more optimal usage of compute resources (eg. data preparation steps can be run on a CPU, while model training steps run on a GPU).
|
||||
- If enabled, it allows the cached output of a step to be reused in cases where re-running it would not give a different result (eg. a step for preprocessing data would not run again if the inputs and source code remains the same - it would just use the same output from the previous run).
|
||||
- It allows more optimal usage of compute resources (e.g., data preparation steps can be run on a CPU, while model training steps run on a GPU).
|
||||
- If enabled, it allows the cached output of a step to be reused in cases where re-running it would not give a different result (e.g., a step for preprocessing data would not run again if the inputs and source code remains the same - it would just use the same output from the previous run).
|
||||
|
||||
The following repository shows an exampe of how you can use the Azure Machine Learning SDK to create a pipeline.
|
||||
The following repository shows an example of how you can use the Azure Machine Learning SDK to create a pipeline.
|
||||
|
||||
## Object Recognition Problem
|
||||
|
||||
|
@ -23,7 +23,7 @@ In order to show the example, we will be training a model that is able to classi
|
|||
|
||||
`Output`: Reference to directory containing the raw data.
|
||||
|
||||
This step will leverage [Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the inputed datastore reference.
|
||||
This step will leverage [Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the input datastore reference.
|
||||
|
||||
#### Step 2: Preprocess Data
|
||||
|
||||
|
@ -108,4 +108,4 @@ If you want to use a custom image:
|
|||
|
||||
```
|
||||
python test-endpoint.py --image_url <URL OF IMAGE>
|
||||
```
|
||||
```
|
Двоичные данные
images/pipeline-screenshot.png
Двоичные данные
images/pipeline-screenshot.png
Двоичный файл не отображается.
До Ширина: | Высота: | Размер: 103 KiB После Ширина: | Высота: | Размер: 83 KiB |
|
@ -36,7 +36,7 @@ accuracy_file = args.accuracy_file
|
|||
scoring_url = args.scoring_url
|
||||
|
||||
# Define model and service names
|
||||
service_name = 'object-recognition-service'
|
||||
service_name = 'object-reco-service'
|
||||
model_name = 'object-recognition-pipeline'
|
||||
|
||||
# Get run context
|
||||
|
|
|
@ -32,6 +32,7 @@ def deploy_step(model_dir, accuracy_file, test_dir, compute_target):
|
|||
outputs_map = { 'scoring_url': scoring_url }
|
||||
|
||||
step = PythonScriptStep(
|
||||
name="Deploy Model",
|
||||
script_name='deploy.py',
|
||||
arguments=[
|
||||
'--model_dir', model_dir,
|
||||
|
|
|
@ -40,6 +40,7 @@ def evaluate_step(model_dir, test_dir, compute_target):
|
|||
use_gpu=True)
|
||||
|
||||
step = EstimatorStep(
|
||||
name="Evaluate Model",
|
||||
estimator=estimator,
|
||||
estimator_entry_script_arguments=[
|
||||
'--test_dir', test_dir,
|
||||
|
|
|
@ -4,15 +4,15 @@ from azureml.core.runconfig import RunConfiguration
|
|||
from azureml.pipeline.core import PipelineData
|
||||
from azureml.pipeline.core import PipelineParameter
|
||||
|
||||
def data_ingestion_step(datastore_reference, compute_target):
|
||||
def data_ingestion_step(datastore, compute_target):
|
||||
'''
|
||||
This step will leverage Azure Cognitive Services to search the web for images
|
||||
to create a dataset. This replicates the real-world scenario of data being
|
||||
ingested from a constantly changing source. The same 10 classes in the CIFAR-10 dataset
|
||||
will be used (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).
|
||||
|
||||
:param datastore_reference: The reference to the datastore that will be used
|
||||
:type datastore_reference: DataReference
|
||||
:param datastore: The datastore that will be used
|
||||
:type datastore: Datastore
|
||||
:param compute_target: The compute target to run the step on
|
||||
:type compute_target: ComputeTarget
|
||||
|
||||
|
@ -29,7 +29,7 @@ def data_ingestion_step(datastore_reference, compute_target):
|
|||
raw_data_dir = PipelineData(
|
||||
name='raw_data_dir',
|
||||
pipeline_output_name='raw_data_dir',
|
||||
datastore=datastore_reference.datastore,
|
||||
datastore=datastore,
|
||||
output_mode='mount',
|
||||
is_directory=True)
|
||||
|
||||
|
@ -37,9 +37,9 @@ def data_ingestion_step(datastore_reference, compute_target):
|
|||
outputs_map = { 'raw_data_dir': raw_data_dir }
|
||||
|
||||
step = PythonScriptStep(
|
||||
name="Data Ingestion",
|
||||
script_name='data_ingestion.py',
|
||||
arguments=['--output_dir', raw_data_dir, '--num_images', num_images],
|
||||
inputs=[datastore_reference],
|
||||
outputs=outputs,
|
||||
compute_target=compute_target,
|
||||
source_directory=os.path.dirname(os.path.abspath(__file__)),
|
||||
|
|
|
@ -54,6 +54,7 @@ def data_preprocess_step(raw_data_dir, compute_target):
|
|||
}
|
||||
|
||||
step = PythonScriptStep(
|
||||
name="Preprocess Data",
|
||||
script_name='data_preprocess.py',
|
||||
arguments=[
|
||||
'--raw_data_dir', raw_data_dir,
|
||||
|
|
|
@ -46,6 +46,7 @@ def train_step(train_dir, valid_dir, compute_target):
|
|||
use_gpu=True)
|
||||
|
||||
step = EstimatorStep(
|
||||
name="Train Model",
|
||||
estimator=estimator,
|
||||
estimator_entry_script_arguments=[
|
||||
'--train_dir', train_dir,
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
from azureml.core import Workspace
|
||||
from azureml.core import Experiment
|
||||
from azureml.pipeline.core import Pipeline
|
||||
from azureml.data.data_reference import DataReference
|
||||
from modules.ingestion.data_ingestion_step import data_ingestion_step
|
||||
from modules.preprocess.data_preprocess_step import data_preprocess_step
|
||||
from modules.train.train_step import train_step
|
||||
|
@ -16,8 +15,8 @@ datastore = workspace.get_default_datastore()
|
|||
|
||||
# Create CPU compute target
|
||||
print('Creating CPU compute target ...')
|
||||
cpu_cluster_name = 'ds3cluster'
|
||||
cpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
|
||||
cpu_cluster_name = 'cpucluster'
|
||||
cpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
|
||||
idle_seconds_before_scaledown=1200,
|
||||
min_nodes=0,
|
||||
max_nodes=2)
|
||||
|
@ -26,17 +25,14 @@ cpu_compute_target.wait_for_completion(show_output=True)
|
|||
|
||||
# Create GPU compute target
|
||||
print('Creating GPU compute target ...')
|
||||
gpu_cluster_name = 'k80cluster'
|
||||
gpu_compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6',
|
||||
gpu_cluster_name = 'gpucluster'
|
||||
gpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
|
||||
idle_seconds_before_scaledown=1200,
|
||||
min_nodes=0,
|
||||
max_nodes=2)
|
||||
gpu_compute_target = ComputeTarget.create(workspace, gpu_cluster_name, gpu_compute_config)
|
||||
gpu_compute_target.wait_for_completion(show_output=True)
|
||||
|
||||
# Get datastore reference
|
||||
datastore = DataReference(datastore, mode='mount')
|
||||
|
||||
# Step 1: Data ingestion
|
||||
data_ingestion_step, data_ingestion_outputs = data_ingestion_step(datastore, cpu_compute_target)
|
||||
|
||||
|
@ -63,4 +59,4 @@ pipeline_parameters = {
|
|||
'momentum': 0.9
|
||||
}
|
||||
pipeline = Pipeline(workspace=workspace, steps=[data_ingestion_step, data_preprocess_step, train_step, evaluate_step, deploy_step])
|
||||
pipeline_run = Experiment(workspace, 'object-recognition-pipeline').submit(pipeline, pipeline_parameters=pipeline_parameters)
|
||||
pipeline_run = Experiment(workspace, 'Object-Recognition-Demo').submit(pipeline, pipeline_parameters=pipeline_parameters)
|
Загрузка…
Ссылка в новой задаче