зеркало из
1
0
Форкнуть 0
* first stab at hello world

* simplify hellop world component environment

* add list of problems to this branch

* typo

* another typo

* eyes-on -> public

* add recommendation to solve problems in order

* typo as usual

* problem 1 guidance

* polish problem 1 files

* typo

* problem 2

* problem 03

* readme's and support (not over yet)

* .

* wrap up main README

* instructions to get started

* problem 1

* problem 2

* problem 3

* address Fuhui's comments

Co-authored-by: XXX <XXX@me.com>
This commit is contained in:
Thomas 2021-07-08 17:04:28 -07:00 коммит произвёл GitHub
Родитель c98e76a006
Коммит 7e63009716
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
35 изменённых файлов: 914 добавлений и 32 удалений

Просмотреть файл

@ -1,14 +1,25 @@
# Project
> This repo has been populated by an initial template to help get you started. Please
> make sure to update the content to build a great experience for community-building.
This project gathers a list of problems (and their solutions) to help people ramp up on [Azure ML](https://azure.microsoft.com/en-us/services/machine-learning/).
The problems are
divided in two categories: a first set of "generic" problems to get started with Azure ML, and a second set focusing on `shrike`, a [library](https://github.com/Azure/shrike) providing utilities to help create and run experiments on the Azure ML platform.
As the maintainer of this project, please make a few updates:
## Generic Azure ML problems
:construction: Work in Progress :construction:
- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
## Azure ML problems aimed at learning `shrike`
The list of problems aiming at teaching how to use the `shrike` [library](https://github.com/Azure/shrike)
is given [here](./shrike-problems/shrike-problem-set.md) in the `shrike-problems` directory.
Detailed instructions for each problem can be found in the `problems` subdirectory
([here](./shrike-problems/problems/pipelines-01.md) is the first problem, for instance).
The `shrike-examples` directory contains all the files that need to be modified to solve the problems.
It is a good example of the recommended repository architecture for Azure ML projects using `shrike`.
To get started, please follow the instructions of the [ReadMe](/.shrike-examples/ReadMe.md) file located in that directory.
:exclamation: Spoiler Alert :exclamation: The solutions to the problems are available in the [shrike-solutions](https://github.com/Azure/azure-ml-problem-sets/tree/shrike-solutions)
branch. If you visit this branch, you will see that the files in the `shrike-examples` directory are complete.
## Contributing

Просмотреть файл

@ -1,13 +1,3 @@
# TODO: The maintainer of this repo has not yet edited this file
**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
- **No CSS support:** Fill out this template with information about how to file issues and get help.
- **Yes CSS support:** Fill out an intake form at [aka.ms/spot](https://aka.ms/spot). CSS will work with/help you to determine next steps. More details also available at [aka.ms/onboardsupport](https://aka.ms/onboardsupport).
- **Not sure?** Fill out a SPOT intake as though the answer were "Yes". CSS will help you decide.
*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
# Support
## How to file issues and get help
@ -16,9 +6,8 @@ This project uses GitHub Issues to track bugs and feature requests. Please searc
issues before filing new issues to avoid duplicates. For new issues, file your bug or
feature request as a new Issue.
For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
For quick help and questions about using this project, you can also email aml-ds-feedback@microsoft.com.
Just be aware that we might still ask you to create an issue for tracking.
## Microsoft Support Policy

17
shrike-examples/ReadMe.md Normal file
Просмотреть файл

@ -0,0 +1,17 @@
## Getting started with the `shrike` problems
- Clone the current repository and set `shrike-examples` as your working directory.
- Set up and activate a new Conda environment:
`conda create --name shrike-examples-env python=3.7 -y`,
`conda activate shrike-examples-env`.
- Install the `shrike` dependencies:
`pip install -r requirements.txt`
## List of problems
The list of problems is given [here](../shrike-problems/shrike-problem-set.md) in the
`shrike-problems` directory, and guidance for individual problems can be found in the
`problems` subdirectory.
To solve the problems, just follow the **Guidance** section in each problem description,
and modify the appropriate files as indicated. You can look for the `# To-Do` comment string to locate the parts of the files that need modifying.

Просмотреть файл

@ -0,0 +1,5 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

Просмотреть файл

@ -0,0 +1,3 @@
name: helloworldcomponent_env
dependencies:
- python=3.7

Просмотреть файл

@ -0,0 +1 @@
../../contoso

Просмотреть файл

@ -0,0 +1,26 @@
$schema: http://azureml/sdk-2-0/CommandComponent.json
name: componentwithparameter
version: 0.0.1
display_name: ComponentWithParameter
type: CommandComponent
description: Demo component that adds 1000 to the 'value' parameter.
is_deterministic: true
tags:
contact: aml-ds@microsoft.com
inputs:
# To-Do: define a 'value' parameter
outputs: {}
# To-Do: add newly introduced parameter to the command
command: >-
python3 run.py
environment:
docker:
enabled: true
image: mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
conda:
userManagedDependencies: false
conda_dependencies_file: component_env.yaml
os: Linux

Просмотреть файл

@ -0,0 +1,5 @@
"""run.py for demo component"""
from contoso.add_one_thousand_to_parameter_script import main
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,5 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

Просмотреть файл

@ -0,0 +1,6 @@
name: countrows_env
dependencies:
- python=3.7
- pip:
- numpy==1.19.4
- pandas==1.3.0

Просмотреть файл

@ -0,0 +1 @@
../../contoso

Просмотреть файл

@ -0,0 +1,26 @@
$schema: http://azureml/sdk-2-0/CommandComponent.json
name: countrows
version: 0.0.1
display_name: CountRows
type: CommandComponent
description: Demo component that counts the rows in the input datset.
is_deterministic: true
tags:
contact: aml-ds@microsoft.com
inputs:
# To-Do: define an 'input_data' parameter
outputs: {}
# To-Do: add newly introduced parameter to the command
command: >-
python3 run.py
environment:
docker:
enabled: true
image: mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
conda:
userManagedDependencies: false
conda_dependencies_file: component_env.yaml
os: Linux

Просмотреть файл

@ -0,0 +1,7 @@
"""run.py for demo component"""
import os
from contoso.count_rows_script import main
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,5 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

Просмотреть файл

@ -0,0 +1,3 @@
name: helloworldcomponent_env
dependencies:
- python=3.7

Просмотреть файл

@ -0,0 +1 @@
../../contoso

Просмотреть файл

@ -0,0 +1,24 @@
$schema: http://azureml/sdk-2-0/CommandComponent.json
name: helloworldcomponent
version: 0.0.1
display_name: HelloWorldComponent
type: CommandComponent
description: Demo component that displays "Hello, World!".
is_deterministic: true
tags:
contact: aml-ds@microsoft.com
inputs: {}
outputs: {}
command: >-
python3 run.py
environment:
docker:
enabled: true
image: mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
conda:
userManagedDependencies: false
conda_dependencies_file: component_env.yaml
os: Linux

Просмотреть файл

@ -0,0 +1,5 @@
"""run.py for demo component"""
from contoso.hello_world_script import main
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,34 @@
import argparse
def get_arg_parser(parser=None):
"""Parse the command line arguments for merge using argparse
Args:
parser (argparse.ArgumentParser or CompliantArgumentParser): an argument parser instance
Returns:
ArgumentParser: the argument parser instance
Notes:
if parser is None, creates a new parser instance
"""
# add arguments that are specific to the component
if parser is None:
parser = argparse.ArgumentParser(description=__doc__)
# To-Do
return parser
def main():
"""The main function"""
# get the arguments
parser = get_arg_parser()
args = parser.parse_args()
args = vars(args)
# To-Do
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,35 @@
import argparse
import pandas as pd
def get_arg_parser(parser=None):
"""Parse the command line arguments for merge using argparse
Args:
parser (argparse.ArgumentParser or CompliantArgumentParser): an argument parser instance
Returns:
ArgumentParser: the argument parser instance
Notes:
if parser is None, creates a new parser instance
"""
# add arguments that are specific to the component
if parser is None:
parser = argparse.ArgumentParser(description=__doc__)
# To-Do
return parser
def main():
"""The main function"""
# get the arguments
parser = get_arg_parser()
args = parser.parse_args()
args = vars(args)
# To-Do
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,6 @@
def main():
"""The main function"""
print("Hello, world!")
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,6 @@
# @package _group_
subscription_id: <your-subscription-id>
resource_group: <your-resource-group-name>
workspace_name: <your-workspace-name>
tenant: <your-tenant-id>
auth: "interactive"

Просмотреть файл

@ -0,0 +1,26 @@
# @package _group_
# name of default target
default_compute_target: "cpu-cluster"
# where intermediary output is written
compliant_datastore: "workspaceblobstore"
# Linux targets
linux_cpu_dc_target: "cpu-cluster"
linux_cpu_prod_target: "cpu-cluster"
linux_gpu_dc_target: "gpu-nc12-lowpri"
linux_gpu_prod_target: "gpu-nc12-lowpri"
# data I/O for linux modules
linux_input_mode: "mount"
linux_output_mode: "mount"
# Windows targets
windows_cpu_prod_target: "cpu-win"
windows_cpu_dc_target: "cpu-win"
# data I/O for windows modules
windows_input_mode: "download"
windows_output_mode: "upload"
# hdi cluster
hdi_prod_target: "hdi-cluster"

Просмотреть файл

@ -0,0 +1,46 @@
# This yaml file configures the hello_world demo experiment
# command for running the pipeline:
# python pipelines/experiments/demo_component_with_parameter.py --config-dir pipelines/config --config-name experiments/demo_component_with_parameter run.submit=True
# defaults contain references of the aml resources
# found in config/aml/, config/compute/ and config/modules
# usually don't modify this
defaults:
- aml: public_workspace # default aml references
- compute: public_workspace # default compute target names
- modules: module_defaults # list of modules + versions
# run parameters are command line arguments for running your experiment
run: # params for running pipeline
experiment_name: "demo_component_with_parameter" # IMPORTANT
regenerate_outputs: false
continue_on_failure: false
verbose: false
submit: false
resume: false
canary: false
silent: false
wait: false
# module_loader
module_loader: # module loading params
# IMPORTANT: if you want to modify a given module, add its key here
# see the code for identifying the module key
# use comma separation in this string to use multiple local modules
use_local: "ComponentWithParameter"
# fix the version of modules in all subgraphs (if left unspecified)
# NOTE: use the latest release version to "fix" your branch to a given release
# see https://eemo.visualstudio.com/TEE/_release?_a=releases&view=mine&definitionId=76
force_default_module_version: null
# forces ALL module versions to this unique value (even if specified otherwise in code)
force_all_module_version: null
# path to the steps folder, don't modify this one
# NOTE: we're working on deprecating this one
local_steps_folder: "../../../components" # NOTE: run scripts from the `shrike-examples` directory
# DemoComponent config
# To-Do

Просмотреть файл

@ -0,0 +1,46 @@
# This yaml file configures the hello_world demo experiment
# command for running the pipeline:
# python pipelines/experiments/demo_count_rows.py --config-dir pipelines/config --config-name experiments/demo_count_rows run.submit=True
# defaults contain references of the aml resources
# found in config/aml/, config/compute/ and config/modules
# usually don't modify this
defaults:
- aml: public_workspace # default aml references
- compute: public_workspace # default compute target names
- modules: module_defaults # list of modules + versions
# run parameters are command line arguments for running your experiment
run: # params for running pipeline
experiment_name: "demo_count_rows" # IMPORTANT
regenerate_outputs: false
continue_on_failure: false
verbose: false
submit: false
resume: false
canary: false
silent: false
wait: false
# module_loader
module_loader: # module loading params
# IMPORTANT: if you want to modify a given module, add its key here
# see the code for identifying the module key
# use comma separation in this string to use multiple local modules
use_local: "CountRows"
# fix the version of modules in all subgraphs (if left unspecified)
# NOTE: use the latest release version to "fix" your branch to a given release
# see https://eemo.visualstudio.com/TEE/_release?_a=releases&view=mine&definitionId=76
force_default_module_version: null
# forces ALL module versions to this unique value (even if specified otherwise in code)
force_all_module_version: null
# path to the steps folder, don't modify this one
# NOTE: we're working on deprecating this one
local_steps_folder: "../../../components" # NOTE: run scripts from the `shrike-examples` directory
# DemoComponent config
# To-Do

Просмотреть файл

@ -0,0 +1,43 @@
# This yaml file configures the hello_world demo experiment
# command for running the pipeline:
# python pipelines/experiments/demo_hello_world.py --config-dir pipelines/config --config-name experiments/demo_hello_world run.submit=True
# defaults contain references of the aml resources
# found in config/aml/, config/compute/ and config/modules
# usually don't modify this
defaults:
- aml: public_workspace # default aml references
- compute: public_workspace # default compute target names
- modules: module_defaults # list of modules + versions
# run parameters are command line arguments for running your experiment
run: # params for running pipeline
experiment_name: "name_your_experiment_here" # To-Do
regenerate_outputs: false
continue_on_failure: false
verbose: false
submit: false
resume: false
canary: false
silent: false
wait: false
# module_loader
module_loader: # module loading params
# IMPORTANT: if you want to modify a given module, add its key here
# see the code for identifying the module key
# use comma separation in this string to use multiple local modules
use_local: "HelloWorldComponent"
# fix the version of modules in all subgraphs (if left unspecified)
# NOTE: use the latest release version to "fix" your branch to a given release
# see https://eemo.visualstudio.com/TEE/_release?_a=releases&view=mine&definitionId=76
force_default_module_version: null
# forces ALL module versions to this unique value (even if specified otherwise in code)
force_all_module_version: null
# path to the steps folder, don't modify this one
# NOTE: we're working on deprecating this one
local_steps_folder: "../../../components" # NOTE: run scripts from the `shrike-examples` directory

Просмотреть файл

@ -0,0 +1,19 @@
# @package _group_
# Contains all the demo components with default version
manifest:
# HELLO WORLD COMPONENT
- key: "HelloWorldComponent"
name: "helloworldcomponent"
version: null
yaml: "hello_world/component_spec.yaml"
# COMPONENT THAT OPERATES ON A PARAMETER VALUE
# - key: # To-Do
# name: # To-Do
# version: null
# yaml: # To-Do
# COMPONENT THAT COUNTS ROWS IN THE INPUT DATASET
- key: "CountRows"
name: "countrows"
version: null
yaml: "count_rows/component_spec.yaml"

Просмотреть файл

@ -0,0 +1,96 @@
"""
The Azure ML pipeline for running a basic 'Hello, World!' experiment
to execute:
> python pipelines/experiments/demo_component_with_parameter.py --config-dir pipelines/config --config-name experiments/demo_component_with_parameter run.submit=True
"""
# pylint: disable=no-member
# NOTE: because it raises 'dict' has no 'outputs' member in dsl.pipeline construction
import os
import sys
from azure.ml.component import dsl
from shrike.pipeline.pipeline_helper import AMLPipelineHelper
# NOTE: if you need to import from pipelines.*
ACCELERATOR_ROOT_PATH = os.path.abspath(
os.path.join(os.path.dirname(__file__), "..", "..")
)
if ACCELERATOR_ROOT_PATH not in sys.path:
print(f"Adding to path: {ACCELERATOR_ROOT_PATH}")
sys.path.append(str(ACCELERATOR_ROOT_PATH))
class ComponentWithParameterDemo(AMLPipelineHelper):
"""Runnable/reusable pipeline helper class
This class inherits from AMLPipelineHelper which provides
helper functions to create reusable production pipelines.
"""
def build(self, config):
"""Builds a pipeline function for this pipeline using AzureML SDK (dsl.pipeline).
This method returns a constructed pipeline function (decorated with @dsl.pipeline).
Args:
config (DictConfig): configuration object
Returns:
dsl.pipeline: the function to create your pipeline
"""
# helper functions below load the subgraph/component from registered or local version depending on your config.run.use_local
component_with_parameter = self.component_load("ComponentWithParameter")
# Here you should create an instance of a pipeline function (using your custom config dataclass)
@dsl.pipeline(
name="demo-component-with-parameter",
description="The Azure ML demo of a component that operates on a parameter value",
default_datastore=config.compute.compliant_datastore,
)
def demo_pipeline_function():
"""Pipeline function for this graph.
Returns:
dict[str->PipelineOutputData]: a dictionary of your pipeline outputs
for instance to be consumed by other graphs
"""
# general syntax:
# component_instance = component_class(input=data, param=value)
# or
# subgraph_instance = subgraph_function(input=data, param=value)
demo_component_step = component_with_parameter(<your-parameter-name> = config.democomponent.<your-parameter-name>) # To-Do
self.apply_recommended_runsettings(
"ComponentWithParameter", demo_component_step, gpu=False
)
# finally return the function itself to be built by helper code
return demo_pipeline_function
def pipeline_instance(self, pipeline_function, config):
"""Given a pipeline function, creates a runnable instance based on provided config.
This is used only when calling this as a runnable pipeline using .main() function (see below).
The goal of this function is to map the config to the pipeline_function inputs and params.
Args:
pipeline_function (function): the pipeline function obtained from self.build()
config (DictConfig): configuration object
Returns:
azureml.core.Pipeline: the instance constructed with its inputs and params.
"""
# we simply call the pipeline function
demo_pipeline = pipeline_function()
# and we return that function so that helper can run it.
return demo_pipeline
# NOTE: main block is necessary only if script is intended to be run from command line
if __name__ == "__main__":
# calling the helper .main() function
ComponentWithParameterDemo.main()

Просмотреть файл

@ -0,0 +1,106 @@
"""
The Azure ML pipeline for running a basic 'Hello, World!' experiment
to execute:
> python pipelines/experiments/demo_count_rows.py --config-dir pipelines/config --config-name experiments/demo_count_rows run.submit=True
"""
# pylint: disable=no-member
# NOTE: because it raises 'dict' has no 'outputs' member in dsl.pipeline construction
import os
import sys
from azure.ml.component import dsl
from shrike.pipeline.pipeline_helper import AMLPipelineHelper
# NOTE: if you need to import from pipelines.*
ACCELERATOR_ROOT_PATH = os.path.abspath(
os.path.join(os.path.dirname(__file__), "..", "..")
)
if ACCELERATOR_ROOT_PATH not in sys.path:
print(f"Adding to path: {ACCELERATOR_ROOT_PATH}")
sys.path.append(str(ACCELERATOR_ROOT_PATH))
class CountRowsDemo(AMLPipelineHelper):
"""Runnable/reusable pipeline helper class
This class inherits from AMLPipelineHelper which provides
helper functions to create reusable production pipelines.
"""
def build(self, config):
"""Builds a pipeline function for this pipeline using AzureML SDK (dsl.pipeline).
This method returns a constructed pipeline function (decorated with @dsl.pipeline).
Args:
config (DictConfig): configuration object
Returns:
dsl.pipeline: the function to create your pipeline
"""
# helper functions below load the subgraph/component from registered or local version depending on your config.run.use_local
count_rows_component = self.component_load("CountRows")
# Here you should create an instance of a pipeline function (using your custom config dataclass)
@dsl.pipeline(
name="demo-component-count-rows",
description="The Azure ML demo of a component that reads a dataset and counts the number of rows.",
default_datastore=config.compute.compliant_datastore,
)
def demo_pipeline_function(): # To-Do (include an input dataset as argument)
"""Pipeline function for this graph.
Args:
demo_dataset: input dataset
Returns:
dict[str->PipelineOutputData]: a dictionary of your pipeline outputs
for instance to be consumed by other graphs
"""
# general syntax:
# component_instance = component_class(input=data, param=value)
# or
# subgraph_instance = subgraph_function(input=data, param=value)
demo_component_step = count_rows_component() # To-Do (include an input dataset as argument)
self.apply_recommended_runsettings(
"CountRows", demo_component_step, gpu=False
)
# finally return the function itself to be built by helper code
return demo_pipeline_function
def pipeline_instance(self, pipeline_function, config):
"""Given a pipeline function, creates a runnable instance based on provided config.
This is used only when calling this as a runnable pipeline using .main() function (see below).
The goal of this function is to map the config to the pipeline_function inputs and params.
Args:
pipeline_function (function): the pipeline function obtained from self.build()
config (DictConfig): configuration object
Returns:
azureml.core.Pipeline: the instance constructed with its inputs and params.
"""
# NOTE: self.dataset_load() helps to load the dataset based on its name and version
# To-Do (include an input dataset as argument)
pipeline_input_dataset = self.dataset_load(
name=<your-dataset-name-from-the-config-file>,
version=<your-dataset-version-from-the-config-file>,
)
# we simply call the pipeline function
demo_pipeline = pipeline_function(demo_dataset=pipeline_input_dataset)
# and we return that function so that helper can run it.
return demo_pipeline
# NOTE: main block is necessary only if script is intended to be run from command line
if __name__ == "__main__":
# calling the helper .main() function
CountRowsDemo.main()

Просмотреть файл

@ -0,0 +1,94 @@
"""
The Azure ML pipeline for running a basic 'Hello, World!' experiment
to execute:
> python pipelines/experiments/demo_hello_world.py --config-dir pipelines/config --config-name experiments/demo_hello_world run.submit=True
"""
# pylint: disable=no-member
# NOTE: because it raises 'dict' has no 'outputs' member in dsl.pipeline construction
import os
import sys
from azure.ml.component import dsl
from shrike.pipeline.pipeline_helper import AMLPipelineHelper
# NOTE: if you need to import from pipelines.*
ACCELERATOR_ROOT_PATH = os.path.abspath(
os.path.join(os.path.dirname(__file__), "..", "..")
)
if ACCELERATOR_ROOT_PATH not in sys.path:
print(f"Adding to path: {ACCELERATOR_ROOT_PATH}")
sys.path.append(str(ACCELERATOR_ROOT_PATH))
class HelloWorldDemo(AMLPipelineHelper):
"""Runnable/reusable pipeline helper class
This class inherits from AMLPipelineHelper which provides
helper functions to create reusable production pipelines.
"""
def build(self, config):
"""Builds a pipeline function for this pipeline using AzureML SDK (dsl.pipeline).
This method returns a constructed pipeline function (decorated with @dsl.pipeline).
Args:
config (DictConfig): configuration object
Returns:
dsl.pipeline: the function to create your pipeline
"""
# helper functions below load the subgraph/component from registered or local version depending on your config.run.use_local
hello_world_component = self.component_load("<your-component-key>") # To-Do
# Here you should create an instance of a pipeline function (using your custom config dataclass)
@dsl.pipeline(
name="demo-hello-world",
description="The Azure ML 'Hello, World!' demo",
default_datastore=config.compute.compliant_datastore,
)
def demo_pipeline_function():
"""Pipeline function for this graph.
Returns:
dict[str->PipelineOutputData]: a dictionary of your pipeline outputs
for instance to be consumed by other graphs
"""
# general syntax:
# component_instance = component_class(input=data, param=value)
# or
# subgraph_instance = subgraph_function(input=data, param=value)
demo_component_step = <name_of_component_loaded_above()> # To-Do
self.apply_recommended_runsettings("<your-component-key>", demo_component_step, gpu=False) # To-Do
# finally return the function itself to be built by helper code
return demo_pipeline_function
def pipeline_instance(self, pipeline_function, config):
"""Given a pipeline function, creates a runnable instance based on provided config.
This is used only when calling this as a runnable pipeline using .main() function (see below).
The goal of this function is to map the config to the pipeline_function inputs and params.
Args:
pipeline_function (function): the pipeline function obtained from self.build()
config (DictConfig): configuration object
Returns:
azureml.core.Pipeline: the instance constructed with its inputs and params.
"""
# we simply call the pipeline function
demo_pipeline = pipeline_function()
# and we return that function so that helper can run it.
return demo_pipeline
# NOTE: main block is necessary only if script is intended to be run from command line
if __name__ == "__main__":
# calling the helper .main() function
HelloWorldDemo.main()

Просмотреть файл

@ -0,0 +1 @@
shrike[pipeline]

Просмотреть файл

@ -0,0 +1,56 @@
# Pipelines problem 01 - Hello, world!
## Problem statement
Submit a pipeline with a single "Hello, world!"-type component.
## Motivation
The goal of this problem is to get you familiar with how to use `shrike.pipeline` for creating and submitting experiments.
## Out of scope
_Component creation_ is out of scope for this problem and will be covered in [problem 02](./pipelines-02.md). For the current problem, we will be using the "Hello, world!" component defined in the `components/hello_world` folder - all it does is print "Hello, world!" in the logs.
## Guidance
### Set your workspace
Open the `pipelines/config/aml/public_workspace.yaml` file and update the `subscription_id`, `resource_group`, `workspace_name`, and `tenant` values with those corresponding to your workspace. You can get the first 3 by downloading the config file from the Azure ML UI (click on the workspace name in the top right, then on "Download config file"). You can get the last one (tenant Id) from the workspace URL (which should have a part like "_&tid=\<your-tenant-id\>_").
### Double check the computes and datastores names
Open the `pipelines/config/aml/public_workspace.yaml` file and double check that the `default_compute_target`, `linux_cpu_dc_target`, and `linux_cpu_prod_target` point to your cpu cluster (usually named "cpu-cluster" by default, but can be adjusted in this file if your cpu cluster has another name).
The `compliant_datastore` name should be the default "workspaceblobstore".
### Prepare your experiment python file
In Azure ML, the experiments (_a.k.a. graphs_) are typically defined _via_ code, in what we will call an "experiment python file". We have prepared a stub of this file for you: [demo_hello_world.py](../../shrike-examples/pipelines/experiments/demo_hello_world.py).
Open this file and start scrolling down. In the `HelloWorldDemo` class, you will first find a `build()` function, which first loads the subgraphs and components used in the graph (in our case, a single component). Look for the line below and insert the _component key_ that will control which component to load - you can find the component key in the components dictionary defined in the [module_defaults.yaml](../../shrike-examples/pipelines/config/modules/module_defaults.yaml) file.
```python
hello_world_component = self.component_load("<your-component-key>")
```
After that, keep scrolling and you will soon encounter the `demo_pipeline_function()` function which actually defines the graph. You can use the `name` and `description` parameters in the decorator to give a meaningful name and description to your graph. After that, we need to instantiate the components making up our graph. In our current, simple case of a 1-component graph, all we need is to instantiate a single step (`demo_component_step`) with the component we just loaded. To do so, just adjust the following line with the name of the component you loaded above.
```python
demo_component_step = <name_of_component_loaded_above()>
```
Finally, we leverage the `shrike.pipeline` package to apply the proper run parameters (_e.g._ in which compute to run the component). To do so, just call the `apply_recommended_runsettings()` function as shown below, with the same component key you used to load the component; you can see how we specify this component to run on a cpu.
```python
self.apply_recommended_runsettings(
"<your-component-key>", demo_component_step, gpu=False
)
```
### Configure your experiment
The various parameters controlling the execution of an experiment can be defined _via_ the command line, or _via_ a _configuration file_.
Open the experiment configuration file [demo_hello_world.yaml](../../shrike-examples/pipelines/config/experiments/demo_hello_world.yaml) that has already been prepared for you. Adjust the `run.experiment_name` parameter to give your experiment a meaningful name.
### Submit your experiment
To submit your experiment just run the command given at the top of the experiment [configuration file](../../shrike-examples/pipelines/config/experiments/demo_hello_world.yaml).
### Check the logs
Once your experiment has executed successfully, click on the component, then on "Outputs + logs". In the driver log (usually called "70_driver_log.txt"), look for the "Hello, world!" line. Tadaa!
### Links to successful execution
A successful run of the experiment can be found [here](https://ml.azure.com/runs/8043ce8a-5045-4211-9934-1959d5296a48?wsid=/subscriptions/48bbc269-ce89-4f6f-9a12-c6f91fcb772d/resourcegroups/aml1p-rg/workspaces/aml1p-ml-wus2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47). (This is mostly for internal use, as you likely will not have access to that workspace.)

Просмотреть файл

@ -0,0 +1,67 @@
# Pipelines problem 02 - Component that operates on a parameter
## Problem statement
Submit a single-component pipeline where the component operates on a value passed as parameter (pass the parameter value through a config file or via the command line at pipeline submission time).
## Motivation
The goal of this problem is to get you familiar with how to create components that consumes parameters, and how to set parameter values.
## Out of scope
_Consuming a dataset_ is out of scope for this problem and will be covered in [problem 03](./pipelines-03.md).
## Guidance
### Prepare your component specification
Open the [component_spec.yaml](../../shrike-examples/components/add_one_thousand_to_parameter/component_spec.yaml) file in the `components/add_one_thousand_to_parameter` directory. Add a "`value`" integer parameter to the `inputs` section following the [CommandComponent documentation](https://componentsdk.azurewebsites.net/components/command_component.html), then add your newly added parameter to the command.
### Prepare your component script
The command in the component specification tells to run the [run.py](../../shrike-examples/components/add_one_thousand_to_parameter/run.py) file located in the component folder. If you open this file, you will see that it just calls the main method of the [add_one_thousand_to_parameter_script.py](../../shrike-examples/contoso/add_one_thousand_to_parameter_script.py) file. It is _that_ file that we call the _component script_ and that we will now prepare.
#### Implement the `get_arg_parser()` method
First you need to implement a `get_arg_parser()` method that returns an instance of the argument parser. If you're not familiar with parsing arguments, the _argparse_ library [documentation](https://docs.python.org/3/library/argparse.html) should help.
#### Implement the `main()` method
Then, implement the `main()` method to consume your newly introduced parameter. For this exercise, let's just add 1000 to the parameter value, and print both operands and the result.
### Add you component to the component dictionary
Open the [module_defaults.yaml](../../shrike-examples/pipelines/config/modules/module_defaults.yaml) file and add an entry for the new component following the example of the HelloWorldComponent.
- `key` is how you will retrieve the component later on.
- `name` must match the name you defined in the component specification, as it will be used to retrieve the component if you use the remote version.
- `version` can be left `null`.
- `yaml` is the location of the component specification.
### Configure your experiment
The various parameters controlling the execution of an experiment can be defined _via_ the _command line_, or _via_ a _configuration file_.
Open the experiment configuration file [demo_component_with_parameter.yaml](../../shrike-examples/pipelines/config/experiments/demo_component_with_parameter.yaml) that has already been prepared for you. Create a new section where you define the parameter value, something like the below.
```yaml
# DemoComponent config
democomponent: # 'democomponent' section name will be used in the experiment python file and in the command line to refer to this set of config parameters
<your-parameter-name>: 314 # the value on which the component will operate
```
### Prepare your experiment python file
Now that your component should be ready and your experiment should be configured properly, let's prepare [your experiment python file](../../shrike-examples/pipelines/experiments/demo_component_with_parameter.py). The process is very similar to what you did in in the previous problem. The only difference is that when you instantiate your component, you will need to provide the parameter value defined in the config file, as demonstrated below.
```python
demo_component_step = component_with_parameter(<your-parameter-name> = config.democomponent.<your-parameter-name>)
```
### Submit your experiment...
#### ... with the parameter value defined in the config file
To submit your experiment with the parameter value defined in the config file, just run the command shown at the top of the experiment python file.
#### ... with the parameter value defined from the command line
To submit your experiment with the parameter value defined in the command line (and thus overriding the value in the config file), just run the command shown at the top of the experiment python file _with the following addition_.
```
democomponent.<your-parameter-name>=51
```
### Check the logs
Once your experiment has executed successfully, click on the component, then on "Outputs + logs". In the driver log (usually called "70_driver_log.txt"), look for the line starting with "The value passed as parameter is...". Tadaa!
### Links to successful execution
A successful run of the experiment can be found [here](https://ml.azure.com/runs/d8e205df-4351-406d-a1e2-8ffbea7b9741?wsid=/subscriptions/48bbc269-ce89-4f6f-9a12-c6f91fcb772d/resourcegroups/aml1p-rg/workspaces/aml1p-ml-wus2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47). (This is mostly for internal use, as you likely will not have access to that workspace.)

Просмотреть файл

@ -0,0 +1,58 @@
# Pipelines problem 03 - Consume a dataset
## Problem statement
Submit a single-component pipeline which consumes a dataset (for example count the number of records).
## Motivation
The goal of this problem is to get you familiar with how to read datasets as component inputs.
## Out of scope
_Outputting data_ is out of scope for this problem and will be covered in [problem 04](./pipelines-04.md).
## Prerequisites
To be able to consume a dataset, the main thing you need is, well, a dataset! If you do not have one already, you can create one from the Azure Open Datasets through the Azure ML UI, following [these instructions](https://docs.microsoft.com/en-us/azure/open-datasets/how-to-create-azure-machine-learning-dataset-from-open-dataset#create-datasets-with-the-studio).
## Guidance
### Prepare your component specification
Open the [component_spec.yaml](../../shrike-examples/components/count_rows/component_spec.yaml) file in the `components/count_rows` directory. Add an "`input_data`" parameter to the `inputs` section following the [CommandComponent documentation](https://componentsdk.azurewebsites.net/components/command_component.html), then add your newly added parameter to the command.
### Prepare your component script
The command in the component specification tells to run the [run.py](../../shrike-examples/components/count_rows/run.py) file located in the component folder. If you open this file, you will see that it just calls the main method of the [count_rows_script.py](../../shrike-examples/contoso/count_rows_script.py) file. It is _that_ file that we call the _component script_ and that we will now prepare.
#### Implement the `get_arg_parser()` method
First you need to implement a `get_arg_parser()` method that returns an instance of the argument parser. If you're not familiar with parsing arguments, the _argparse_ library [documentation](https://docs.python.org/3/library/argparse.html) should help. Here, we will have a single input named `input_data`.
#### Implement the `main()` method
Then, implement the `main()` method to read the dataset, count the number of rows, and print the result. The name of the input dataset is treated as a folder. The name of the actual file(s) to load from that folder depends on which dataset you're using - if you are unsure, just find your dataset in the UI and click "Explore"; it should tell you which files are available.
After you've read the dataset (feel free to just use a sample if you have chosen a large dataset), count the number of rows and print the result. We suggest you use `pandas` for counting rows. As you can see in the [component_env.yaml](../../shrike-examples/components/count_rows/component_env.yaml) file that defines the environment where the component will run, `pandas` should be available.
### Configure your experiment
The various parameters controlling the execution of an experiment can be defined _via_ the _command line_, or _via_ a _configuration file_. In this problem, we will focus on the _configuration file_ route.
Open the experiment configuration file [demo_count_rows.yaml](../../shrike-examples/pipelines/config/experiments/demo_count_rows.yaml) that has already been prepared for you. Create a new section where you define the value of the `input_data` parameter, _i.e._ the name of your dataset. It should be similar to the below.
```yaml
# DemoComponent config
democomponent:
input_data: <name-of-your-dataset> # the name of the dataset you'll be working on, as seen in the UI
input_data_version: "latest" # this will ensure you always consume the latest version of the dataset
```
### Prepare your experiment python file
Now that your component should be ready and your experiment should be configured properly, let's prepare [your experiment python file](../../shrike-examples/pipelines/experiments/demo_count_rows.py). The process is very similar to what you did in in the previous problem. The main differences are as follows.
- The pipeline function (`demo_pipeline_function()`) will now take an input dataset as an argument, that you will also need when instantiating the component.
- In the `pipeline_instance()` function, you will need to load the dataset using the name and version provided in the config file, and pass the loaded dataset to the pipeline function.
### Submit your experiment
To submit your experiment with the parameter value defined in the config file, just run the command shown at the top of the experiment python file.
### Check the logs
Once your experiment has executed successfully, click on the component, then on "Outputs + logs". In the driver log (usually called "70_driver_log.txt"), look for the line containing the number of rows that you should have added in onf of the above steps. Tadaa!
### Links to successful execution
A successful run of the experiment can be found [here](https://ml.azure.com/runs/e50f9945-d9ec-417d-a593-0a4dbb6b7690?wsid=/subscriptions/48bbc269-ce89-4f6f-9a12-c6f91fcb772d/resourcegroups/aml1p-rg/workspaces/aml1p-ml-wus2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47). (This is mostly for internal use, as you likely will not have access to that workspace.)

Просмотреть файл

@ -5,19 +5,22 @@ This document is a proposal of a list of problems to help people learn how to us
## List of problems
### Creating,submitting, and validating pipelines
:warning: Note that these problems are meant to be tackled sequentially, as the solution of problem _N+1_ builds upon the solution of problem _N_.
1. Submit a pipeline with a single "Hello, World!"-type component.
2. Submit a single-component pipeline where the component operates on a value passed as parameter (pass the parameter value through a config file or via the command line at pipeline submission time).
3. Submit a single-component pipeline which consumes a dataset (for example count the number of records).
4. Submit a multi-component pipeline where one component's output is the input of a subsequent component.
5. Submit a multi-component pipeline which uses a subgraph.
6. Submit a pipeline where a component is chosen based on a parameter value.
7. Re-submit one of the previous pipelines but with a different component version.
8. Add integration tests to ensure a pipeline does not break.
### Creating, submitting, and validating pipelines
- [Pipelines Problem 01](./problems/pipelines-01.md) Submit a pipeline with a single "Hello, world!"-type component.
- [Pipelines Problem 02](./problems/pipelines-02.md) Submit a single-component pipeline where the component operates on a value passed as parameter (pass the parameter value through a config file or via the command line at pipeline submission time).
- [Pipelines Problem 03](./problems/pipelines-03.md) Submit a single-component pipeline which consumes a dataset (for example count the number of records).
- [Pipelines Problem 04](./problems/pipelines-04.md) Submit a multi-component pipeline where one component's output is the input of a subsequent component.
- [Pipelines Problem 05](./problems/pipelines-05.md) Submit a multi-component pipeline which uses a subgraph.
- [Pipelines Problem 06](./problems/pipelines-06.md) Submit a pipeline where a component is chosen based on a parameter value.
- [Pipelines Problem 07](./problems/pipelines-07.md) Add integration tests to ensure a pipeline does not break.
### Logging
1. Submit a pipeline using the compliant logger to log various properties of the dataset consumed by a component (such as number of records or average of a numerical field, for instance).
2. Experiment with the different data categories available to the compliant logger.
3. Experiment with the various options about stack trace prefixing (customize the prefix and the exception message, scrub the exception message unless it is in an allowed list).
:construction: Work in Progress :construction:
- [Logging Problem 01](./problems/logging-01.md) Submit a pipeline using the compliant logger to log various properties of the dataset consumed by a component (such as number of records or average of a numerical field, for instance).
- [Logging Problem 02](./problems/logging-02.md) Experiment with the different data categories available to the compliant logger.
- [Logging Problem 03](./problems/logging-03.md) Experiment with the various options about stack trace prefixing (customize the prefix and the exception message, scrub the exception message unless it is in an allowed list).