Shrike problems - first 3 (#3)

* first stab at hello world * simplify hellop world component environment * add list of problems to this branch * typo * another typo * eyes-on -> public * add recommendation to solve problems in order * typo as usual * problem 1 guidance * polish problem 1 files * typo * problem 2 * problem 03 * readme's and support (not over yet) * . * wrap up main README * instructions to get started * problem 1 * problem 2 * problem 3 * address Fuhui's comments Co-authored-by: XXX <XXX@me.com>
2021-07-08 17:04:28 -07:00 · 2021-07-08 17:04:28 -07:00 · 7e63009716
--- a/README.md
+++ b/README.md
@ -1,14 +1,25 @@
 # Project

-> This repo has been populated by an initial template to help get you started. Please
-> make sure to update the content to build a great experience for community-building.
+This project gathers a list of problems (and their solutions) to help people ramp up on [Azure ML](https://azure.microsoft.com/en-us/services/machine-learning/).
+The problems are
+divided in two categories: a first set of "generic" problems to get started with Azure ML, and a second set focusing on `shrike`, a [library](https://github.com/Azure/shrike) providing utilities to help create and run experiments on the Azure ML platform. 

-As the maintainer of this project, please make a few updates:
+## Generic Azure ML problems
+:construction: Work in Progress :construction:

- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
+## Azure ML problems aimed at learning `shrike`
+
+The list of problems aiming at teaching how to use the `shrike` [library](https://github.com/Azure/shrike)
+is given [here](./shrike-problems/shrike-problem-set.md) in the `shrike-problems` directory.
+Detailed instructions for each problem can be found in the `problems` subdirectory
+([here](./shrike-problems/problems/pipelines-01.md) is the first problem, for instance).
+
+The `shrike-examples` directory contains all the files that need to be modified to solve the problems.
+It is a good example of the recommended repository architecture for Azure ML projects using `shrike`.
+To get started, please follow the instructions of the [ReadMe](/.shrike-examples/ReadMe.md) file located in that directory.
+
+:exclamation: Spoiler Alert :exclamation: The solutions to the problems are available in the [shrike-solutions](https://github.com/Azure/azure-ml-problem-sets/tree/shrike-solutions)
+branch. If you visit this branch, you will see that the files in the `shrike-examples` directory are complete.

 ## Contributing

--- a/SUPPORT.md
+++ b/SUPPORT.md
@ -1,13 +1,3 @@
-# TODO: The maintainer of this repo has not yet edited this file
-
-**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
-
- **No CSS support:** Fill out this template with information about how to file issues and get help.
- **Yes CSS support:** Fill out an intake form at [aka.ms/spot](https://aka.ms/spot). CSS will work with/help you to determine next steps. More details also available at [aka.ms/onboardsupport](https://aka.ms/onboardsupport).
- **Not sure?** Fill out a SPOT intake as though the answer were "Yes". CSS will help you decide.
-
-*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
-
 # Support

 ## How to file issues and get help  
@ -16,9 +6,8 @@ This project uses GitHub Issues to track bugs and feature requests. Please searc
 issues before filing new issues to avoid duplicates.  For new issues, file your bug or 
 feature request as a new Issue.

-For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE 
-FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
-CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
+For quick help and questions about using this project, you can also email aml-ds-feedback@microsoft.com.
+Just be aware that we might still ask you to create an issue for tracking.

 ## Microsoft Support Policy  

--- a/shrike-examples/ReadMe.md
+++ b/shrike-examples/ReadMe.md
@ -0,0 +1,17 @@
+## Getting started with the `shrike` problems
+
+- Clone the current repository and set `shrike-examples` as your working directory.
+- Set up and activate a new Conda environment:
+  `conda create --name shrike-examples-env python=3.7 -y`,
+  `conda activate shrike-examples-env`.
+- Install the `shrike` dependencies:
+  `pip install -r requirements.txt`
+
+## List of problems
+
+The list of problems is given [here](../shrike-problems/shrike-problem-set.md) in the
+`shrike-problems` directory, and guidance for individual problems can be found in the
+`problems` subdirectory. 
+
+To solve the problems, just follow the **Guidance** section in each problem description,
+and modify the appropriate files as indicated. You can look for the `# To-Do` comment string to locate the parts of the files that need modifying.
--- a/shrike-examples/components/add_one_thousand_to_parameter/.amlignore
+++ b/shrike-examples/components/add_one_thousand_to_parameter/.amlignore
@ -0,0 +1,5 @@
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
--- a/shrike-examples/components/add_one_thousand_to_parameter/component_env.yaml
+++ b/shrike-examples/components/add_one_thousand_to_parameter/component_env.yaml
@ -0,0 +1,3 @@
+name: helloworldcomponent_env
+dependencies:
+  - python=3.7
--- a/shrike-examples/components/add_one_thousand_to_parameter/component_spec.additional_includes
+++ b/shrike-examples/components/add_one_thousand_to_parameter/component_spec.additional_includes
@ -0,0 +1 @@
+../../contoso
--- a/shrike-examples/components/add_one_thousand_to_parameter/component_spec.yaml
+++ b/shrike-examples/components/add_one_thousand_to_parameter/component_spec.yaml
@ -0,0 +1,26 @@
+$schema: http://azureml/sdk-2-0/CommandComponent.json
+name: componentwithparameter
+version: 0.0.1
+display_name: ComponentWithParameter
+type: CommandComponent
+description: Demo component that adds 1000 to the 'value' parameter.
+is_deterministic: true
+tags:
+  contact: aml-ds@microsoft.com
+
+inputs:
+  # To-Do: define a 'value' parameter
+
+outputs: {}
+
+# To-Do: add newly introduced parameter to the command
+command: >-
+  python3 run.py
+environment:
+  docker:
+    enabled: true
+    image: mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
+  conda:
+    userManagedDependencies: false
+    conda_dependencies_file: component_env.yaml
+  os: Linux
--- a/shrike-examples/components/add_one_thousand_to_parameter/run.py
+++ b/shrike-examples/components/add_one_thousand_to_parameter/run.py
@ -0,0 +1,5 @@
+"""run.py for demo component"""
+from contoso.add_one_thousand_to_parameter_script import main
+
+if __name__ == "__main__":
+    main()
--- a/shrike-examples/components/count_rows/.amlignore
+++ b/shrike-examples/components/count_rows/.amlignore
@ -0,0 +1,5 @@
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
--- a/shrike-examples/components/count_rows/component_env.yaml
+++ b/shrike-examples/components/count_rows/component_env.yaml
@ -0,0 +1,6 @@
+name: countrows_env
+dependencies:
+  - python=3.7
+  - pip:
+    - numpy==1.19.4
+    - pandas==1.3.0
--- a/shrike-examples/components/count_rows/component_spec.additional_includes
+++ b/shrike-examples/components/count_rows/component_spec.additional_includes
@ -0,0 +1 @@
+../../contoso
--- a/shrike-examples/components/count_rows/component_spec.yaml
+++ b/shrike-examples/components/count_rows/component_spec.yaml
@ -0,0 +1,26 @@
+$schema: http://azureml/sdk-2-0/CommandComponent.json
+name: countrows
+version: 0.0.1
+display_name: CountRows
+type: CommandComponent
+description: Demo component that counts the rows in the input datset.
+is_deterministic: true
+tags:
+  contact: aml-ds@microsoft.com
+
+inputs:
+  # To-Do: define an 'input_data' parameter
+
+outputs: {}
+
+# To-Do: add newly introduced parameter to the command
+command: >-
+  python3 run.py 
+environment:
+  docker:
+    enabled: true
+    image: mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
+  conda:
+    userManagedDependencies: false
+    conda_dependencies_file: component_env.yaml
+  os: Linux
--- a/shrike-examples/components/count_rows/run.py
+++ b/shrike-examples/components/count_rows/run.py
@ -0,0 +1,7 @@
+"""run.py for demo component"""
+import os
+
+from contoso.count_rows_script import main
+
+if __name__ == "__main__":
+    main()
--- a/shrike-examples/components/hello_world/.amlignore
+++ b/shrike-examples/components/hello_world/.amlignore
@ -0,0 +1,5 @@
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
--- a/shrike-examples/components/hello_world/component_env.yaml
+++ b/shrike-examples/components/hello_world/component_env.yaml
@ -0,0 +1,3 @@
+name: helloworldcomponent_env
+dependencies:
+  - python=3.7
--- a/shrike-examples/components/hello_world/component_spec.additional_includes
+++ b/shrike-examples/components/hello_world/component_spec.additional_includes
@ -0,0 +1 @@
+../../contoso
--- a/shrike-examples/components/hello_world/component_spec.yaml
+++ b/shrike-examples/components/hello_world/component_spec.yaml
@ -0,0 +1,24 @@
+$schema: http://azureml/sdk-2-0/CommandComponent.json
+name: helloworldcomponent
+version: 0.0.1
+display_name: HelloWorldComponent
+type: CommandComponent
+description: Demo component that displays "Hello, World!".
+is_deterministic: true
+tags:
+  contact: aml-ds@microsoft.com
+
+inputs: {}
+
+outputs: {}
+
+command: >-
+  python3 run.py
+environment:
+  docker:
+    enabled: true
+    image: mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04
+  conda:
+    userManagedDependencies: false
+    conda_dependencies_file: component_env.yaml
+  os: Linux
--- a/shrike-examples/components/hello_world/run.py
+++ b/shrike-examples/components/hello_world/run.py
@ -0,0 +1,5 @@
+"""run.py for demo component"""
+from contoso.hello_world_script import main
+
+if __name__ == "__main__":
+    main()
--- a/shrike-examples/contoso/add_one_thousand_to_parameter_script.py
+++ b/shrike-examples/contoso/add_one_thousand_to_parameter_script.py
@ -0,0 +1,34 @@
+import argparse
+
+def get_arg_parser(parser=None):
+    """Parse the command line arguments for merge using argparse
+
+    Args:
+        parser (argparse.ArgumentParser or CompliantArgumentParser): an argument parser instance
+
+    Returns:
+        ArgumentParser: the argument parser instance
+
+    Notes:
+        if parser is None, creates a new parser instance
+    """
+    # add arguments that are specific to the component
+    if parser is None:
+        parser = argparse.ArgumentParser(description=__doc__)
+
+    # To-Do
+
+    return parser
+
+def main():
+    """The main function"""
+    
+    # get the arguments
+    parser = get_arg_parser()
+    args = parser.parse_args()
+    args = vars(args)
+
+    # To-Do
+
+if __name__ == "__main__":
+    main()
--- a/shrike-examples/contoso/count_rows_script.py
+++ b/shrike-examples/contoso/count_rows_script.py
@ -0,0 +1,35 @@
+import argparse
+import pandas as pd
+
+def get_arg_parser(parser=None):
+    """Parse the command line arguments for merge using argparse
+
+    Args:
+        parser (argparse.ArgumentParser or CompliantArgumentParser): an argument parser instance
+
+    Returns:
+        ArgumentParser: the argument parser instance
+
+    Notes:
+        if parser is None, creates a new parser instance
+    """
+    # add arguments that are specific to the component
+    if parser is None:
+        parser = argparse.ArgumentParser(description=__doc__)
+
+    # To-Do
+
+    return parser
+
+def main():
+    """The main function"""
+    
+    # get the arguments
+    parser = get_arg_parser()
+    args = parser.parse_args()
+    args = vars(args)
+
+    # To-Do
+
+if __name__ == "__main__":
+    main()
--- a/shrike-examples/contoso/hello_world_script.py
+++ b/shrike-examples/contoso/hello_world_script.py
@ -0,0 +1,6 @@
+def main():
+    """The main function"""
+    print("Hello, world!")
+
+if __name__ == "__main__":
+    main()
--- a/shrike-examples/pipelines/config/aml/public_workspace.yaml
+++ b/shrike-examples/pipelines/config/aml/public_workspace.yaml
@ -0,0 +1,6 @@
+# @package _group_
+subscription_id: <your-subscription-id>
+resource_group: <your-resource-group-name>
+workspace_name: <your-workspace-name>
+tenant: <your-tenant-id>
+auth: "interactive"
--- a/shrike-examples/pipelines/config/compute/public_workspace.yaml
+++ b/shrike-examples/pipelines/config/compute/public_workspace.yaml
@ -0,0 +1,26 @@
+# @package _group_
+# name of default target
+default_compute_target: "cpu-cluster"
+# where intermediary output is written
+compliant_datastore: "workspaceblobstore"
+
+# Linux targets
+linux_cpu_dc_target: "cpu-cluster"
+linux_cpu_prod_target: "cpu-cluster"
+linux_gpu_dc_target: "gpu-nc12-lowpri"
+linux_gpu_prod_target: "gpu-nc12-lowpri"
+
+# data I/O for linux modules
+linux_input_mode: "mount"
+linux_output_mode: "mount"
+
+# Windows targets
+windows_cpu_prod_target: "cpu-win"
+windows_cpu_dc_target: "cpu-win"
+
+# data I/O for windows modules
+windows_input_mode: "download"
+windows_output_mode: "upload"
+
+# hdi cluster
+hdi_prod_target: "hdi-cluster"
--- a/shrike-examples/pipelines/config/experiments/demo_component_with_parameter.yaml
+++ b/shrike-examples/pipelines/config/experiments/demo_component_with_parameter.yaml
@ -0,0 +1,46 @@
+# This yaml file configures the hello_world demo experiment
+
+# command for running the pipeline:
+# python pipelines/experiments/demo_component_with_parameter.py --config-dir pipelines/config --config-name experiments/demo_component_with_parameter run.submit=True
+
+# defaults contain references of the aml resources
+# found in config/aml/, config/compute/ and config/modules
+# usually don't modify this
+defaults:
+  - aml: public_workspace # default aml references
+  - compute: public_workspace # default compute target names
+  - modules: module_defaults # list of modules + versions
+
+# run parameters are command line arguments for running your experiment
+run: # params for running pipeline
+  experiment_name: "demo_component_with_parameter" # IMPORTANT
+  regenerate_outputs: false
+  continue_on_failure: false
+  verbose: false
+  submit: false
+  resume: false
+  canary: false
+  silent: false
+  wait: false
+
+# module_loader
+module_loader: # module loading params
+  # IMPORTANT: if you want to modify a given module, add its key here
+  # see the code for identifying the module key
+  # use comma separation in this string to use multiple local modules
+  use_local: "ComponentWithParameter"
+
+  # fix the version of modules in all subgraphs (if left unspecified)
+  # NOTE: use the latest release version to "fix" your branch to a given release
+  # see https://eemo.visualstudio.com/TEE/_release?_a=releases&view=mine&definitionId=76
+  force_default_module_version: null
+
+  # forces ALL module versions to this unique value (even if specified otherwise in code)
+  force_all_module_version: null
+
+  # path to the steps folder, don't modify this one
+  # NOTE: we're working on deprecating this one
+  local_steps_folder: "../../../components" # NOTE: run scripts from the `shrike-examples` directory
+
+# DemoComponent config
+# To-Do
--- a/shrike-examples/pipelines/config/experiments/demo_count_rows.yaml
+++ b/shrike-examples/pipelines/config/experiments/demo_count_rows.yaml
@ -0,0 +1,46 @@
+# This yaml file configures the hello_world demo experiment
+
+# command for running the pipeline:
+# python pipelines/experiments/demo_count_rows.py --config-dir pipelines/config --config-name experiments/demo_count_rows run.submit=True
+
+# defaults contain references of the aml resources
+# found in config/aml/, config/compute/ and config/modules
+# usually don't modify this
+defaults:
+  - aml: public_workspace # default aml references
+  - compute: public_workspace # default compute target names
+  - modules: module_defaults # list of modules + versions
+
+# run parameters are command line arguments for running your experiment
+run: # params for running pipeline
+  experiment_name: "demo_count_rows" # IMPORTANT
+  regenerate_outputs: false
+  continue_on_failure: false
+  verbose: false
+  submit: false
+  resume: false
+  canary: false
+  silent: false
+  wait: false
+
+# module_loader
+module_loader: # module loading params
+  # IMPORTANT: if you want to modify a given module, add its key here
+  # see the code for identifying the module key
+  # use comma separation in this string to use multiple local modules
+  use_local: "CountRows"
+
+  # fix the version of modules in all subgraphs (if left unspecified)
+  # NOTE: use the latest release version to "fix" your branch to a given release
+  # see https://eemo.visualstudio.com/TEE/_release?_a=releases&view=mine&definitionId=76
+  force_default_module_version: null
+
+  # forces ALL module versions to this unique value (even if specified otherwise in code)
+  force_all_module_version: null
+
+  # path to the steps folder, don't modify this one
+  # NOTE: we're working on deprecating this one
+  local_steps_folder: "../../../components" # NOTE: run scripts from the `shrike-examples` directory
+
+# DemoComponent config
+# To-Do
--- a/shrike-examples/pipelines/config/experiments/demo_hello_world.yaml
+++ b/shrike-examples/pipelines/config/experiments/demo_hello_world.yaml
@ -0,0 +1,43 @@
+# This yaml file configures the hello_world demo experiment
+
+# command for running the pipeline:
+# python pipelines/experiments/demo_hello_world.py --config-dir pipelines/config --config-name experiments/demo_hello_world run.submit=True
+
+# defaults contain references of the aml resources
+# found in config/aml/, config/compute/ and config/modules
+# usually don't modify this
+defaults:
+  - aml: public_workspace # default aml references
+  - compute: public_workspace # default compute target names
+  - modules: module_defaults # list of modules + versions
+
+# run parameters are command line arguments for running your experiment
+run: # params for running pipeline
+  experiment_name: "name_your_experiment_here"  # To-Do
+  regenerate_outputs: false
+  continue_on_failure: false
+  verbose: false
+  submit: false
+  resume: false
+  canary: false
+  silent: false
+  wait: false
+
+# module_loader
+module_loader: # module loading params
+  # IMPORTANT: if you want to modify a given module, add its key here
+  # see the code for identifying the module key
+  # use comma separation in this string to use multiple local modules
+  use_local: "HelloWorldComponent"
+
+  # fix the version of modules in all subgraphs (if left unspecified)
+  # NOTE: use the latest release version to "fix" your branch to a given release
+  # see https://eemo.visualstudio.com/TEE/_release?_a=releases&view=mine&definitionId=76
+  force_default_module_version: null
+
+  # forces ALL module versions to this unique value (even if specified otherwise in code)
+  force_all_module_version: null
+
+  # path to the steps folder, don't modify this one
+  # NOTE: we're working on deprecating this one
+  local_steps_folder: "../../../components" # NOTE: run scripts from the `shrike-examples` directory
--- a/shrike-examples/pipelines/config/modules/module_defaults.yaml
+++ b/shrike-examples/pipelines/config/modules/module_defaults.yaml
@ -0,0 +1,19 @@
+# @package _group_
+# Contains all the demo components with default version
+manifest:
+  # HELLO WORLD COMPONENT
+  - key: "HelloWorldComponent"
+    name: "helloworldcomponent"
+    version: null
+    yaml: "hello_world/component_spec.yaml"
+  # COMPONENT THAT OPERATES ON A PARAMETER VALUE
+  # - key: # To-Do
+  #   name: # To-Do
+  #   version: null
+  #   yaml: # To-Do
+  # COMPONENT THAT COUNTS ROWS IN THE INPUT DATASET
+  - key: "CountRows"
+    name: "countrows"
+    version: null
+    yaml: "count_rows/component_spec.yaml"
+ 
--- a/shrike-examples/pipelines/experiments/demo_component_with_parameter.py
+++ b/shrike-examples/pipelines/experiments/demo_component_with_parameter.py
@ -0,0 +1,96 @@
+"""
+The Azure ML pipeline for running a basic 'Hello, World!' experiment
+
+to execute:
+> python pipelines/experiments/demo_component_with_parameter.py --config-dir pipelines/config --config-name experiments/demo_component_with_parameter run.submit=True
+"""
+# pylint: disable=no-member
+# NOTE: because it raises 'dict' has no 'outputs' member in dsl.pipeline construction
+import os
+import sys
+
+from azure.ml.component import dsl
+from shrike.pipeline.pipeline_helper import AMLPipelineHelper
+
+# NOTE: if you need to import from pipelines.*
+ACCELERATOR_ROOT_PATH = os.path.abspath(
+    os.path.join(os.path.dirname(__file__), "..", "..")
+)
+if ACCELERATOR_ROOT_PATH not in sys.path:
+    print(f"Adding to path: {ACCELERATOR_ROOT_PATH}")
+    sys.path.append(str(ACCELERATOR_ROOT_PATH))
+
+
+class ComponentWithParameterDemo(AMLPipelineHelper):
+    """Runnable/reusable pipeline helper class
+
+    This class inherits from AMLPipelineHelper which provides
+    helper functions to create reusable production pipelines.
+    """
+
+    def build(self, config):
+        """Builds a pipeline function for this pipeline using AzureML SDK (dsl.pipeline).
+
+        This method returns a constructed pipeline function (decorated with @dsl.pipeline).
+
+        Args:
+            config (DictConfig): configuration object
+
+        Returns:
+            dsl.pipeline: the function to create your pipeline
+        """
+
+        # helper functions below load the subgraph/component from registered or local version depending on your config.run.use_local
+        component_with_parameter = self.component_load("ComponentWithParameter")
+
+        # Here you should create an instance of a pipeline function (using your custom config dataclass)
+        @dsl.pipeline(
+            name="demo-component-with-parameter",
+            description="The Azure ML demo of a component that operates on a parameter value",
+            default_datastore=config.compute.compliant_datastore,
+        )
+        def demo_pipeline_function():
+            """Pipeline function for this graph.
+
+            Returns:
+                dict[str->PipelineOutputData]: a dictionary of your pipeline outputs
+                    for instance to be consumed by other graphs
+            """
+            # general syntax:
+            # component_instance = component_class(input=data, param=value)
+            # or
+            # subgraph_instance = subgraph_function(input=data, param=value)
+            demo_component_step = component_with_parameter(<your-parameter-name> = config.democomponent.<your-parameter-name>) # To-Do
+
+            self.apply_recommended_runsettings(
+                "ComponentWithParameter", demo_component_step, gpu=False
+            )
+
+        # finally return the function itself to be built by helper code
+        return demo_pipeline_function
+
+    def pipeline_instance(self, pipeline_function, config):
+        """Given a pipeline function, creates a runnable instance based on provided config.
+
+        This is used only when calling this as a runnable pipeline using .main() function (see below).
+        The goal of this function is to map the config to the pipeline_function inputs and params.
+
+        Args:
+            pipeline_function (function): the pipeline function obtained from self.build()
+            config (DictConfig): configuration object
+
+        Returns:
+            azureml.core.Pipeline: the instance constructed with its inputs and params.
+        """
+
+        # we simply call the pipeline function
+        demo_pipeline = pipeline_function()
+
+        # and we return that function so that helper can run it.
+        return demo_pipeline
+
+
+# NOTE: main block is necessary only if script is intended to be run from command line
+if __name__ == "__main__":
+    # calling the helper .main() function
+    ComponentWithParameterDemo.main()
--- a/shrike-examples/pipelines/experiments/demo_count_rows.py
+++ b/shrike-examples/pipelines/experiments/demo_count_rows.py
@ -0,0 +1,106 @@
+"""
+The Azure ML pipeline for running a basic 'Hello, World!' experiment
+
+to execute:
+> python pipelines/experiments/demo_count_rows.py --config-dir pipelines/config --config-name experiments/demo_count_rows run.submit=True
+"""
+# pylint: disable=no-member
+# NOTE: because it raises 'dict' has no 'outputs' member in dsl.pipeline construction
+import os
+import sys
+
+from azure.ml.component import dsl
+from shrike.pipeline.pipeline_helper import AMLPipelineHelper
+
+# NOTE: if you need to import from pipelines.*
+ACCELERATOR_ROOT_PATH = os.path.abspath(
+    os.path.join(os.path.dirname(__file__), "..", "..")
+)
+if ACCELERATOR_ROOT_PATH not in sys.path:
+    print(f"Adding to path: {ACCELERATOR_ROOT_PATH}")
+    sys.path.append(str(ACCELERATOR_ROOT_PATH))
+
+
+class CountRowsDemo(AMLPipelineHelper):
+    """Runnable/reusable pipeline helper class
+
+    This class inherits from AMLPipelineHelper which provides
+    helper functions to create reusable production pipelines.
+    """
+
+    def build(self, config):
+        """Builds a pipeline function for this pipeline using AzureML SDK (dsl.pipeline).
+
+        This method returns a constructed pipeline function (decorated with @dsl.pipeline).
+
+        Args:
+            config (DictConfig): configuration object
+
+        Returns:
+            dsl.pipeline: the function to create your pipeline
+        """
+
+        # helper functions below load the subgraph/component from registered or local version depending on your config.run.use_local
+        count_rows_component = self.component_load("CountRows")
+
+        # Here you should create an instance of a pipeline function (using your custom config dataclass)
+        @dsl.pipeline(
+            name="demo-component-count-rows",
+            description="The Azure ML demo of a component that reads a dataset and counts the number of rows.",
+            default_datastore=config.compute.compliant_datastore,
+        )
+        def demo_pipeline_function(): # To-Do (include an input dataset as argument)
+            """Pipeline function for this graph.
+
+            Args:
+                demo_dataset: input dataset
+
+            Returns:
+                dict[str->PipelineOutputData]: a dictionary of your pipeline outputs
+                    for instance to be consumed by other graphs
+            """
+            # general syntax:
+            # component_instance = component_class(input=data, param=value)
+            # or
+            # subgraph_instance = subgraph_function(input=data, param=value)
+            demo_component_step = count_rows_component() # To-Do (include an input dataset as argument)
+
+            self.apply_recommended_runsettings(
+                "CountRows", demo_component_step, gpu=False
+            )
+
+        # finally return the function itself to be built by helper code
+        return demo_pipeline_function
+
+    def pipeline_instance(self, pipeline_function, config):
+        """Given a pipeline function, creates a runnable instance based on provided config.
+
+        This is used only when calling this as a runnable pipeline using .main() function (see below).
+        The goal of this function is to map the config to the pipeline_function inputs and params.
+
+        Args:
+            pipeline_function (function): the pipeline function obtained from self.build()
+            config (DictConfig): configuration object
+
+        Returns:
+            azureml.core.Pipeline: the instance constructed with its inputs and params.
+        """
+
+        # NOTE: self.dataset_load() helps to load the dataset based on its name and version
+        # To-Do (include an input dataset as argument)
+        pipeline_input_dataset = self.dataset_load(
+            name=<your-dataset-name-from-the-config-file>,
+            version=<your-dataset-version-from-the-config-file>,
+        )
+
+        # we simply call the pipeline function
+        demo_pipeline = pipeline_function(demo_dataset=pipeline_input_dataset)
+
+        # and we return that function so that helper can run it.
+        return demo_pipeline
+
+
+# NOTE: main block is necessary only if script is intended to be run from command line
+if __name__ == "__main__":
+    # calling the helper .main() function
+    CountRowsDemo.main()
--- a/shrike-examples/pipelines/experiments/demo_hello_world.py
+++ b/shrike-examples/pipelines/experiments/demo_hello_world.py
@ -0,0 +1,94 @@
+"""
+The Azure ML pipeline for running a basic 'Hello, World!' experiment
+
+to execute:
+> python pipelines/experiments/demo_hello_world.py --config-dir pipelines/config --config-name experiments/demo_hello_world run.submit=True
+"""
+# pylint: disable=no-member
+# NOTE: because it raises 'dict' has no 'outputs' member in dsl.pipeline construction
+import os
+import sys
+
+from azure.ml.component import dsl
+from shrike.pipeline.pipeline_helper import AMLPipelineHelper
+
+# NOTE: if you need to import from pipelines.*
+ACCELERATOR_ROOT_PATH = os.path.abspath(
+    os.path.join(os.path.dirname(__file__), "..", "..")
+)
+if ACCELERATOR_ROOT_PATH not in sys.path:
+    print(f"Adding to path: {ACCELERATOR_ROOT_PATH}")
+    sys.path.append(str(ACCELERATOR_ROOT_PATH))
+
+
+class HelloWorldDemo(AMLPipelineHelper):
+    """Runnable/reusable pipeline helper class
+
+    This class inherits from AMLPipelineHelper which provides
+    helper functions to create reusable production pipelines.
+    """
+
+    def build(self, config):
+        """Builds a pipeline function for this pipeline using AzureML SDK (dsl.pipeline).
+
+        This method returns a constructed pipeline function (decorated with @dsl.pipeline).
+
+        Args:
+            config (DictConfig): configuration object
+
+        Returns:
+            dsl.pipeline: the function to create your pipeline
+        """
+
+        # helper functions below load the subgraph/component from registered or local version depending on your config.run.use_local
+        hello_world_component = self.component_load("<your-component-key>") # To-Do
+
+        # Here you should create an instance of a pipeline function (using your custom config dataclass)
+        @dsl.pipeline(
+            name="demo-hello-world",
+            description="The Azure ML 'Hello, World!' demo",
+            default_datastore=config.compute.compliant_datastore,
+        )
+        def demo_pipeline_function():
+            """Pipeline function for this graph.
+
+            Returns:
+                dict[str->PipelineOutputData]: a dictionary of your pipeline outputs
+                    for instance to be consumed by other graphs
+            """
+            # general syntax:
+            # component_instance = component_class(input=data, param=value)
+            # or
+            # subgraph_instance = subgraph_function(input=data, param=value)
+            demo_component_step = <name_of_component_loaded_above()> # To-Do
+
+            self.apply_recommended_runsettings("<your-component-key>", demo_component_step, gpu=False) # To-Do
+
+        # finally return the function itself to be built by helper code
+        return demo_pipeline_function
+
+    def pipeline_instance(self, pipeline_function, config):
+        """Given a pipeline function, creates a runnable instance based on provided config.
+
+        This is used only when calling this as a runnable pipeline using .main() function (see below).
+        The goal of this function is to map the config to the pipeline_function inputs and params.
+
+        Args:
+            pipeline_function (function): the pipeline function obtained from self.build()
+            config (DictConfig): configuration object
+
+        Returns:
+            azureml.core.Pipeline: the instance constructed with its inputs and params.
+        """
+
+        # we simply call the pipeline function
+        demo_pipeline = pipeline_function()
+
+        # and we return that function so that helper can run it.
+        return demo_pipeline
+
+
+# NOTE: main block is necessary only if script is intended to be run from command line
+if __name__ == "__main__":
+    # calling the helper .main() function
+    HelloWorldDemo.main()
--- a/shrike-examples/requirements.txt
+++ b/shrike-examples/requirements.txt
@ -0,0 +1 @@
+shrike[pipeline]
--- a/shrike-problems/problems/pipelines-01.md
+++ b/shrike-problems/problems/pipelines-01.md
@ -0,0 +1,56 @@
+# Pipelines problem 01 - Hello, world!
+
+## Problem statement
+Submit a pipeline with a single "Hello, world!"-type component.
+
+## Motivation
+The goal of this problem is to get you familiar with how to use `shrike.pipeline` for creating and submitting experiments.
+
+## Out of scope
+_Component creation_ is out of scope for this problem and will be covered in [problem 02](./pipelines-02.md). For the current problem, we will be using the "Hello, world!" component defined in the `components/hello_world` folder - all it does is print "Hello, world!" in the logs.
+
+## Guidance
+
+### Set your workspace
+Open the `pipelines/config/aml/public_workspace.yaml` file and update the `subscription_id`, `resource_group`, `workspace_name`, and `tenant` values with those corresponding to your workspace. You can get the first 3 by downloading the config file from the Azure ML UI (click on the workspace name in the top right, then on "Download config file"). You can get the last one (tenant Id) from the workspace URL (which should have a part like "_&tid=\<your-tenant-id\>_").
+
+### Double check the computes and datastores names
+Open the `pipelines/config/aml/public_workspace.yaml` file and double check that the `default_compute_target`, `linux_cpu_dc_target`, and `linux_cpu_prod_target` point to your cpu cluster (usually named "cpu-cluster" by default, but can be adjusted in this file if your cpu cluster has another name).
+
+The `compliant_datastore` name should be the default "workspaceblobstore".
+
+### Prepare your experiment python file
+In Azure ML, the experiments (_a.k.a. graphs_) are typically defined _via_ code, in what we will call an "experiment python file". We have prepared a stub of this file for you: [demo_hello_world.py](../../shrike-examples/pipelines/experiments/demo_hello_world.py).
+
+Open this file and start scrolling down. In the `HelloWorldDemo` class, you will first find a `build()` function, which first loads the subgraphs and components used in the graph (in our case, a single component). Look for the line below and insert the _component key_ that will control which component to load - you can find the component key in the components dictionary defined in the [module_defaults.yaml](../../shrike-examples/pipelines/config/modules/module_defaults.yaml) file.
+
+```python
+hello_world_component = self.component_load("<your-component-key>")
+```
+
+After that, keep scrolling and you will soon encounter the `demo_pipeline_function()` function which actually defines the graph. You can use the `name` and `description` parameters in the decorator to give a meaningful name and description to your graph. After that, we need to instantiate the components making up our graph. In our current, simple case of a 1-component graph, all we need is to instantiate a single step (`demo_component_step`) with the component we just loaded. To do so, just adjust the following line with the name of the component you loaded above.
+
+```python
+demo_component_step = <name_of_component_loaded_above()>
+```
+
+Finally, we leverage the `shrike.pipeline` package to apply the proper run parameters (_e.g._ in which compute to run the component). To do so, just call the `apply_recommended_runsettings()` function as shown below, with the same component key you used to load the component; you can see how we specify this component to run on a cpu.
+
+```python
+self.apply_recommended_runsettings(
+    "<your-component-key>", demo_component_step, gpu=False
+)
+```            
+
+### Configure your experiment
+The various parameters controlling the execution of an experiment can be defined _via_ the command line, or _via_ a _configuration file_.
+Open the experiment configuration file [demo_hello_world.yaml](../../shrike-examples/pipelines/config/experiments/demo_hello_world.yaml) that has already been prepared for you. Adjust the `run.experiment_name` parameter to give your experiment a meaningful name.
+
+### Submit your experiment
+To submit your experiment just run the command given at the top of the experiment [configuration file](../../shrike-examples/pipelines/config/experiments/demo_hello_world.yaml).
+
+### Check the logs
+Once your experiment has executed successfully, click on the component, then on "Outputs + logs". In the driver log (usually called "70_driver_log.txt"), look for the "Hello, world!" line. Tadaa!
+
+### Links to successful execution
+A successful run of the experiment can be found [here](https://ml.azure.com/runs/8043ce8a-5045-4211-9934-1959d5296a48?wsid=/subscriptions/48bbc269-ce89-4f6f-9a12-c6f91fcb772d/resourcegroups/aml1p-rg/workspaces/aml1p-ml-wus2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47). (This is mostly for internal use, as you likely will not have access to that workspace.)
--- a/shrike-problems/problems/pipelines-02.md
+++ b/shrike-problems/problems/pipelines-02.md
@ -0,0 +1,67 @@
+# Pipelines problem 02 - Component that operates on a parameter
+
+## Problem statement
+Submit a single-component pipeline where the component operates on a value passed as parameter (pass the parameter value through a config file or via the command line at pipeline submission time).
+
+## Motivation
+The goal of this problem is to get you familiar with how to create components that consumes parameters, and how to set parameter values.
+
+## Out of scope
+_Consuming a dataset_ is out of scope for this problem and will be covered in [problem 03](./pipelines-03.md).
+
+## Guidance
+
+### Prepare your component specification
+Open the [component_spec.yaml](../../shrike-examples/components/add_one_thousand_to_parameter/component_spec.yaml) file in the `components/add_one_thousand_to_parameter` directory. Add a "`value`" integer parameter to the `inputs` section following the [CommandComponent documentation](https://componentsdk.azurewebsites.net/components/command_component.html), then add your newly added parameter to the command.
+
+### Prepare your component script
+The command in the component specification tells to run the [run.py](../../shrike-examples/components/add_one_thousand_to_parameter/run.py) file located in the component folder. If you open this file, you will see that it just calls the main method of the [add_one_thousand_to_parameter_script.py](../../shrike-examples/contoso/add_one_thousand_to_parameter_script.py) file. It is _that_ file that we call the _component script_ and that we will now prepare.
+
+#### Implement the `get_arg_parser()` method
+First you need to implement a `get_arg_parser()` method that returns an instance of the argument parser. If you're not familiar with parsing arguments, the _argparse_ library [documentation](https://docs.python.org/3/library/argparse.html) should help.
+
+#### Implement the `main()` method
+Then, implement the `main()` method to consume your newly introduced parameter. For this exercise, let's just add 1000 to the parameter value, and print both operands and the result.
+
+### Add you component to the component dictionary
+Open the [module_defaults.yaml](../../shrike-examples/pipelines/config/modules/module_defaults.yaml) file and add an entry for the new component following the example of the HelloWorldComponent.
+
+- `key` is how you will retrieve the component later on.
+- `name` must match the name you defined in the component specification, as it will be used to retrieve the component if you use the remote version.
+- `version` can be left `null`.
+- `yaml` is the location of the component specification.
+
+### Configure your experiment
+The various parameters controlling the execution of an experiment can be defined _via_ the _command line_, or _via_ a _configuration file_.
+Open the experiment configuration file [demo_component_with_parameter.yaml](../../shrike-examples/pipelines/config/experiments/demo_component_with_parameter.yaml) that has already been prepared for you. Create a new section where you define the parameter value, something like the below.
+
+```yaml
+# DemoComponent config
+democomponent: # 'democomponent' section name will be used in the experiment python file and in the command line to refer to this set of config parameters
+  <your-parameter-name>: 314 # the value on which the component will operate
+```
+
+### Prepare your experiment python file
+Now that your component should be ready and your experiment should be configured properly, let's prepare [your experiment python file](../../shrike-examples/pipelines/experiments/demo_component_with_parameter.py). The process is very similar to what you did in in the previous problem. The only difference is that when you instantiate your component, you will need to provide the parameter value defined in the config file, as demonstrated below.
+
+```python
+demo_component_step = component_with_parameter(<your-parameter-name> = config.democomponent.<your-parameter-name>)
+```
+
+### Submit your experiment...
+
+#### ... with the parameter value defined in the config file
+To submit your experiment with the parameter value defined in the config file, just run the command shown at the top of the experiment python file.
+
+#### ... with the parameter value defined from the command line
+To submit your experiment with the parameter value defined in the command line (and thus overriding the value in the config file), just run the command shown at the top of the experiment python file _with the following addition_.
+
+```
+democomponent.<your-parameter-name>=51
+```
+
+### Check the logs
+Once your experiment has executed successfully, click on the component, then on "Outputs + logs". In the driver log (usually called "70_driver_log.txt"), look for the line starting with "The value passed as parameter is...". Tadaa!
+
+### Links to successful execution
+A successful run of the experiment can be found [here](https://ml.azure.com/runs/d8e205df-4351-406d-a1e2-8ffbea7b9741?wsid=/subscriptions/48bbc269-ce89-4f6f-9a12-c6f91fcb772d/resourcegroups/aml1p-rg/workspaces/aml1p-ml-wus2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47). (This is mostly for internal use, as you likely will not have access to that workspace.)
--- a/shrike-problems/problems/pipelines-03.md
+++ b/shrike-problems/problems/pipelines-03.md
@ -0,0 +1,58 @@
+# Pipelines problem 03 - Consume a dataset
+
+## Problem statement
+Submit a single-component pipeline which consumes a dataset (for example count the number of records).
+
+## Motivation
+The goal of this problem is to get you familiar with how to read datasets as component inputs.
+
+## Out of scope
+_Outputting data_ is out of scope for this problem and will be covered in [problem 04](./pipelines-04.md).
+
+## Prerequisites
+To be able to consume a dataset, the main thing you need is, well, a dataset! If you do not have one already, you can create one from the Azure Open Datasets through the Azure ML UI, following [these instructions](https://docs.microsoft.com/en-us/azure/open-datasets/how-to-create-azure-machine-learning-dataset-from-open-dataset#create-datasets-with-the-studio).
+
+## Guidance
+
+### Prepare your component specification
+Open the [component_spec.yaml](../../shrike-examples/components/count_rows/component_spec.yaml) file in the `components/count_rows` directory. Add an "`input_data`" parameter to the `inputs` section following the [CommandComponent documentation](https://componentsdk.azurewebsites.net/components/command_component.html), then add your newly added parameter to the command.
+
+
+### Prepare your component script
+The command in the component specification tells to run the [run.py](../../shrike-examples/components/count_rows/run.py) file located in the component folder. If you open this file, you will see that it just calls the main method of the [count_rows_script.py](../../shrike-examples/contoso/count_rows_script.py) file. It is _that_ file that we call the _component script_ and that we will now prepare.
+
+#### Implement the `get_arg_parser()` method
+First you need to implement a `get_arg_parser()` method that returns an instance of the argument parser. If you're not familiar with parsing arguments, the _argparse_ library [documentation](https://docs.python.org/3/library/argparse.html) should help. Here, we will have a single input named `input_data`.
+
+#### Implement the `main()` method
+Then, implement the `main()` method to read the dataset, count the number of rows, and print the result. The name of the input dataset is treated as a folder. The name of the actual file(s) to load from that folder depends on which dataset you're using - if you are unsure, just find your dataset in the UI and click "Explore"; it should tell you which files are available. 
+
+After you've read the dataset (feel free to just use a sample if you have chosen a large dataset), count the number of rows and print the result. We suggest you use `pandas` for counting rows. As you can see in the [component_env.yaml](../../shrike-examples/components/count_rows/component_env.yaml) file that defines the environment where the component will run, `pandas` should be available.
+
+### Configure your experiment
+The various parameters controlling the execution of an experiment can be defined _via_ the _command line_, or _via_ a _configuration file_. In this problem, we will focus on the _configuration file_ route.
+Open the experiment configuration file [demo_count_rows.yaml](../../shrike-examples/pipelines/config/experiments/demo_count_rows.yaml) that has already been prepared for you. Create a new section where you define the value of the `input_data` parameter, _i.e._ the name of your dataset. It should be similar to the below.
+
+```yaml
+# DemoComponent config
+democomponent:
+  input_data: <name-of-your-dataset> # the name of the dataset you'll be working on, as seen in the UI
+  input_data_version: "latest" # this will ensure you always consume the latest version of the dataset
+```
+
+### Prepare your experiment python file
+Now that your component should be ready and your experiment should be configured properly, let's prepare [your experiment python file](../../shrike-examples/pipelines/experiments/demo_count_rows.py). The process is very similar to what you did in in the previous problem. The main differences are as follows. 
+
+- The pipeline function (`demo_pipeline_function()`) will now take an input dataset as an argument, that you will also need when instantiating the component.
+- In the `pipeline_instance()` function, you will need to load the dataset using the name and version provided in the config file, and pass the loaded dataset to the pipeline function. 
+
+
+### Submit your experiment
+
+To submit your experiment with the parameter value defined in the config file, just run the command shown at the top of the experiment python file.
+
+### Check the logs
+Once your experiment has executed successfully, click on the component, then on "Outputs + logs". In the driver log (usually called "70_driver_log.txt"), look for the line containing the number of rows that you should have added in onf of the above steps. Tadaa!
+
+### Links to successful execution
+A successful run of the experiment can be found [here](https://ml.azure.com/runs/e50f9945-d9ec-417d-a593-0a4dbb6b7690?wsid=/subscriptions/48bbc269-ce89-4f6f-9a12-c6f91fcb772d/resourcegroups/aml1p-rg/workspaces/aml1p-ml-wus2&tid=72f988bf-86f1-41af-91ab-2d7cd011db47). (This is mostly for internal use, as you likely will not have access to that workspace.)
--- a/shrike-problems/shrike-problem-set.md
+++ b/shrike-problems/shrike-problem-set.md
@ -5,19 +5,22 @@ This document is a proposal of a list of problems to help people learn how to us

 ## List of problems

-### Creating,submitting, and validating pipelines
+:warning: Note that these problems are meant to be tackled sequentially, as the solution of problem _N+1_ builds upon the solution of problem _N_.

-1. Submit a pipeline with a single "Hello, World!"-type component.
-2. Submit a single-component pipeline where the component operates on a value passed as parameter (pass the parameter value through a config file or via the command line at pipeline submission time).
-3. Submit a single-component pipeline which consumes a dataset (for example count the number of records).
-4. Submit a multi-component pipeline where one component's output is the input of a subsequent component.
-5. Submit a multi-component pipeline which uses a subgraph.
-6. Submit a pipeline where a component is chosen based on a parameter value.
-7. Re-submit one of the previous pipelines but with a different component version.
-8. Add integration tests to ensure a pipeline does not break.
+### Creating, submitting, and validating pipelines
+
+- [Pipelines Problem 01](./problems/pipelines-01.md) Submit a pipeline with a single "Hello, world!"-type component.
+- [Pipelines Problem 02](./problems/pipelines-02.md) Submit a single-component pipeline where the component operates on a value passed as parameter (pass the parameter value through a config file or via the command line at pipeline submission time).
+- [Pipelines Problem 03](./problems/pipelines-03.md) Submit a single-component pipeline which consumes a dataset (for example count the number of records).
+- [Pipelines Problem 04](./problems/pipelines-04.md) Submit a multi-component pipeline where one component's output is the input of a subsequent component.
+- [Pipelines Problem 05](./problems/pipelines-05.md) Submit a multi-component pipeline which uses a subgraph.
+- [Pipelines Problem 06](./problems/pipelines-06.md) Submit a pipeline where a component is chosen based on a parameter value.
+- [Pipelines Problem 07](./problems/pipelines-07.md) Add integration tests to ensure a pipeline does not break.

 ### Logging

-1. Submit a pipeline using the compliant logger to log various properties of the dataset consumed by a component (such as number of records or average of a numerical field, for instance).
-2. Experiment with the different data categories available to the compliant logger.
-3. Experiment with the various options about stack trace prefixing (customize the prefix and the exception message, scrub the exception message unless it is in an allowed list).
+:construction: Work in Progress :construction:
+
+- [Logging Problem 01](./problems/logging-01.md) Submit a pipeline using the compliant logger to log various properties of the dataset consumed by a component (such as number of records or average of a numerical field, for instance).
+- [Logging Problem 02](./problems/logging-02.md) Experiment with the different data categories available to the compliant logger.
+- [Logging Problem 03](./problems/logging-03.md) Experiment with the various options about stack trace prefixing (customize the prefix and the exception message, scrub the exception message unless it is in an allowed list).