Updating script files

2020-02-05 10:12:15 -08:00 · 2020-02-05 10:12:15 -08:00 · 5fb7dfd34f
--- a/README.md
+++ b/README.md
@ -1,17 +1,17 @@
 # Azure Machine Learning Pipelines Demo

-An [Azure Machine Learning pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. Each step is an executable module of code which can have inputs and produce outputs (which can then be consumed by other steps as inputs). 
+An [Azure Machine Learning pipeline](https://aka.ms/pl-concept) is an independently executable workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. Each step is an executable module of code which can have inputs and produce outputs (which can then be consumed by other steps as inputs). 

 ![](images/aml-pipeline-flow.png)

 There are multiple advantages to using pipelines:

- It allows data scientists to seperate tasks into non-overlapping components, enabling collaboration and development in parallel.
+- It allows data scientists to separate tasks into non-overlapping components, enabling collaboration and development in parallel.
 - It allows teams and organizations to create reusable templates for common tasks.
- It allows more optimal usage of compute resources (eg. data preparation steps can be run on a CPU, while model training steps run on a GPU).
- If enabled, it allows the cached output of a step to be reused in cases where re-running it would not give a different result (eg. a step for preprocessing data would not run again if the inputs and source code remains the same - it would just use the same output from the previous run).
+- It allows more optimal usage of compute resources (e.g., data preparation steps can be run on a CPU, while model training steps run on a GPU).
+- If enabled, it allows the cached output of a step to be reused in cases where re-running it would not give a different result (e.g., a step for preprocessing data would not run again if the inputs and source code remains the same - it would just use the same output from the previous run).

-The following repository shows an exampe of how you can use the Azure Machine Learning SDK to create a pipeline. 
+The following repository shows an example of how you can use the Azure Machine Learning SDK to create a pipeline. 

 ## Object Recognition Problem

@ -23,7 +23,7 @@ In order to show the example, we will be training a model that is able to classi

 `Output`: Reference to directory containing the raw data.

-This step will leverage [Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the inputed datastore reference.
+This step will leverage [Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/) to search the web for images to create our dataset. This replicates the real-world scenario of data being ingested from a constantly changing source. For this demo, we will use the same 10 classes in the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). All images will be saved into a directory in the input datastore reference.

 #### Step 2: Preprocess Data

@ -108,4 +108,4 @@ If you want to use a custom image:

 ```
 python test-endpoint.py --image_url <URL OF IMAGE>
-```
+```
--- a/images/pipeline-screenshot.png
+++ b/images/pipeline-screenshot.png
--- a/modules/deploy/deploy.py
+++ b/modules/deploy/deploy.py
@ -36,7 +36,7 @@ accuracy_file = args.accuracy_file
 scoring_url = args.scoring_url

 # Define model and service names
-service_name = 'object-recognition-service'
+service_name = 'object-reco-service'
 model_name = 'object-recognition-pipeline'

 # Get run context
--- a/modules/deploy/deploy_step.py
+++ b/modules/deploy/deploy_step.py
@ -32,6 +32,7 @@ def deploy_step(model_dir, accuracy_file, test_dir, compute_target):
    outputs_map = { 'scoring_url': scoring_url }

    step = PythonScriptStep(
+        name="Deploy Model",
        script_name='deploy.py',
        arguments=[
            '--model_dir', model_dir, 
--- a/modules/evaluate/evaluate_step.py
+++ b/modules/evaluate/evaluate_step.py
@ -40,6 +40,7 @@ def evaluate_step(model_dir, test_dir, compute_target):
        use_gpu=True)

    step = EstimatorStep(
+        name="Evaluate Model",
        estimator=estimator,
        estimator_entry_script_arguments=[
            '--test_dir', test_dir, 
--- a/modules/ingestion/data_ingestion_step.py
+++ b/modules/ingestion/data_ingestion_step.py
@ -4,15 +4,15 @@ from azureml.core.runconfig import RunConfiguration
 from azureml.pipeline.core import PipelineData
 from azureml.pipeline.core import PipelineParameter

-def data_ingestion_step(datastore_reference, compute_target):
+def data_ingestion_step(datastore, compute_target):
    '''
    This step will leverage Azure Cognitive Services to search the web for images 
    to create a dataset. This replicates the real-world scenario of data being 
    ingested from a constantly changing source. The same 10 classes in the CIFAR-10 dataset 
    will be used (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). 

-    :param datastore_reference: The reference to the datastore that will be used
-    :type datastore_reference: DataReference
+    :param datastore: The datastore that will be used
+    :type datastore: Datastore
    :param compute_target: The compute target to run the step on
    :type compute_target: ComputeTarget
    
@ -29,7 +29,7 @@ def data_ingestion_step(datastore_reference, compute_target):
    raw_data_dir = PipelineData(
        name='raw_data_dir', 
        pipeline_output_name='raw_data_dir',
-        datastore=datastore_reference.datastore,
+        datastore=datastore,
        output_mode='mount',
        is_directory=True)

@ -37,9 +37,9 @@ def data_ingestion_step(datastore_reference, compute_target):
    outputs_map = { 'raw_data_dir': raw_data_dir }

    step = PythonScriptStep(
+        name="Data Ingestion",
        script_name='data_ingestion.py',
        arguments=['--output_dir', raw_data_dir, '--num_images', num_images],
-        inputs=[datastore_reference],
        outputs=outputs,
        compute_target=compute_target,
        source_directory=os.path.dirname(os.path.abspath(__file__)),
--- a/modules/preprocess/data_preprocess_step.py
+++ b/modules/preprocess/data_preprocess_step.py
@ -54,6 +54,7 @@ def data_preprocess_step(raw_data_dir, compute_target):
    }

    step = PythonScriptStep(
+        name="Preprocess Data",
        script_name='data_preprocess.py',
        arguments=[
            '--raw_data_dir', raw_data_dir, 
--- a/modules/train/train_step.py
+++ b/modules/train/train_step.py
@ -46,6 +46,7 @@ def train_step(train_dir, valid_dir, compute_target):
        use_gpu=True)

    step = EstimatorStep(
+        name="Train Model",
        estimator=estimator,
        estimator_entry_script_arguments=[
            '--train_dir', train_dir, 
--- a/object-recognition-pipeline.py
+++ b/object-recognition-pipeline.py
@ -1,7 +1,6 @@
 from azureml.core import Workspace
 from azureml.core import Experiment
 from azureml.pipeline.core import Pipeline
-from azureml.data.data_reference import DataReference
 from modules.ingestion.data_ingestion_step import data_ingestion_step
 from modules.preprocess.data_preprocess_step import data_preprocess_step
 from modules.train.train_step import train_step
@ -16,8 +15,8 @@ datastore = workspace.get_default_datastore()

 # Create CPU compute target
 print('Creating CPU compute target ...')
-cpu_cluster_name = 'ds3cluster'
-cpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2', 
+cpu_cluster_name = 'cpucluster'
+cpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           idle_seconds_before_scaledown=1200,
                                                           min_nodes=0, 
                                                           max_nodes=2)
@ -26,17 +25,14 @@ cpu_compute_target.wait_for_completion(show_output=True)

 # Create GPU compute target
 print('Creating GPU compute target ...')
-gpu_cluster_name = 'k80cluster'
-gpu_compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6', 
+gpu_cluster_name = 'gpucluster'
+gpu_compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           idle_seconds_before_scaledown=1200,
                                                           min_nodes=0, 
                                                           max_nodes=2)
 gpu_compute_target = ComputeTarget.create(workspace, gpu_cluster_name, gpu_compute_config)
 gpu_compute_target.wait_for_completion(show_output=True)

-# Get datastore reference
-datastore = DataReference(datastore, mode='mount')
-
 # Step 1: Data ingestion 
 data_ingestion_step, data_ingestion_outputs = data_ingestion_step(datastore, cpu_compute_target)

@ -63,4 +59,4 @@ pipeline_parameters = {
    'momentum': 0.9
 }
 pipeline = Pipeline(workspace=workspace, steps=[data_ingestion_step, data_preprocess_step, train_step, evaluate_step, deploy_step])
-pipeline_run = Experiment(workspace, 'object-recognition-pipeline').submit(pipeline, pipeline_parameters=pipeline_parameters)
+pipeline_run = Experiment(workspace, 'Object-Recognition-Demo').submit(pipeline, pipeline_parameters=pipeline_parameters)