recommenders/examples/run_notebook_on_azureml.ipynb

{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<i>Copyright (c) Recommenders contributors.</i>\n",
                "\n",
                "<i>Licensed under the MIT License.</i>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Submit an Existing Notebook to AzureML\n",
                "## Introduction to Azure Machine Learning  \n",
                "The **[Azure Machine Learning service (AzureML)](https://docs.microsoft.com/azure/machine-learning/service/overview-what-is-azure-ml)** provides a cloud-based environment you can use to prep data, train, test, deploy, manage, and track machine learning models. By using Azure Machine Learning service, you can start training on your local machine and then scale out to the cloud. With many available compute targets, like [Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) and [Azure Databricks](https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks), and with [advanced hyperparameter tuning services](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), you can build better models faster by using the power of the cloud.\n",
                "\n",
                "Data scientists and AI developers use the main [Azure Machine Learning Python SDK](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py) to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks or your favorite Python IDE. The Azure Machine Learning SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.\n",
                "![AzureML Workflow](https://docs.microsoft.com/en-us/azure/machine-learning/service/media/overview-what-is-azure-ml/aml.png)\n",
                "\n",
                "This noteboook provides a scaffold to directly submit an existing notebook to AzureML compute targets. Now a user doesn't have to rewrite the training script, instead, just by replacing file name, now user can submit notebook directly.\n",
                "\n",
                "### Advantages of using AzureML:\n",
                "- Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.\n",
                "- Train models either locally or by using cloud resources, including GPU-accelerated model training.\n",
                "- Easy to scale out when dataset grows - by just creating and pointing to new compute target\n",
                "\n",
                "### Prerequisities\n",
                "   - **Azure Subscription**\n",
                "     - If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service today](https://azure.microsoft.com/en-us/free/services/machine-learning/).\n",
                "     - You get credits to spend on Azure services, which will easily cover the cost of running this example notebook. After they're used up, you can keep the account and use [free Azure services](https://azure.microsoft.com/en-us/free/). Your credit card is never charged unless you explicitly change your settings and ask to be charged. Or [activate MSDN subscriber benefits](https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/), which give you credits every month that you can use for paid Azure services.\n",
                "\n",
                "# Setup environment\n",
                "\n",
                "### Install azure.contrib.notebook package\n",
                "We need notebook_run_config from azureml.contrib.notebook to run this notebook. Since azureml.contrib.notebook contains experimental components, it's not included in generated .yaml file by default."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {},
            "outputs": [],
            "source": [
                "#!pip install \"azureml.contrib.notebook>=1.0.21.1\""
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "azureml.core version: 1.0.23\n"
                    ]
                }
            ],
            "source": [
                "import os\n",
                "import azureml.core\n",
                "from azureml.core import Workspace\n",
                "from os import path, makedirs\n",
                "import shutil\n",
                "from azureml.core import Experiment\n",
                "from azureml.core.compute import ComputeTarget, AmlCompute\n",
                "from azureml.widgets import RunDetails\n",
                "from azureml.core.runconfig import RunConfiguration\n",
                "from azureml.contrib.notebook import NotebookRunConfig, PapermillExecutionHandler\n",
                "from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
                "from azureml.core.conda_dependencies import CondaDependencies\n",
                "from tempfile import TemporaryDirectory\n",
                "\n",
                "print(\"azureml.core version: {}\".format(azureml.core.VERSION))"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Connect to an AzureML workspace\n",
                "\n",
                "An [AzureML Workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models.\n",
                "\n",
                "The function below will get or create an AzureML Workspace and save the configuration to `aml_config/config.json`.\n",
                "\n",
                "It defaults to use provided input parameters or environment variables for the Workspace configuration values. Otherwise, it will use an existing configuration file (either at `./aml_config/config.json` or a path specified by the config_path parameter).\n",
                "\n",
                "Lastly, if the workspace does not exist, one will be created for you. See [this tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace#portal) to locate information such as subscription id."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {},
            "outputs": [],
            "source": [
                "ws = Workspace.create(\n",
                "    name=\"<WORKSPACE_NAME>\",\n",
                "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
                "    resource_group=\"<RESOURCE_GROUP>\",\n",
                "    location=\"<WORKSPACE_REGION>\"\n",
                "    exist_ok=True,\n",
                ")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "This example uses run-based AzureML Compute, which requires minimum setup. Alternatively, you can use persistent compute target (cpu or gpu cluster) or remote virtual machines. For more details, see [How to set up training targets](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets).\n",
                "\n",
                "Cost-wise, according to Azure [Pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator/), with example VM size `STANDARD_D2_V2`, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact [Azure support](https://azure.microsoft.com/en-us/support/options/).\n",
                "\n",
                "**Note**:\n",
                "- See list of all available VM sizes [here](https://docs.microsoft.com/en-us/azure/templates/Microsoft.Compute/2018-10-01/virtualMachines?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#hardwareprofile-object).\n",
                "- As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request) on the default limits and how to request more quota.\n",
                "\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Create or Attach Azure Machine Learning Compute \n",
                "\n",
                "We create a cpu cluster as our **remote compute target**. If a cluster with the same name already exists in your workspace, the script will load it instead. You can read [Set up compute targets for model training](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up compute target on different locations. You can also create GPU machines when larger machines are necessary to train the model.\n",
                "\n",
                "According to Azure [Pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator/), with example VM size `STANDARD_D2_V2`, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact [Azure support](https://azure.microsoft.com/en-us/support/options/).\n",
                "\n",
                "**Note**:\n",
                "- 10m and 20m dataset requires more capacity than `STANDARD_D2_V2`, such as `STANDARD_NC6` or `STANDARD_NC12`. See list of all available VM sizes [here](https://docs.microsoft.com/en-us/azure/templates/Microsoft.Compute/2018-10-01/virtualMachines?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#hardwareprofile-object).\n",
                "- As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request) on the default limits and how to request more quota.\n",
                "---\n",
                "#### Learn more about Azure Machine Learning Compute\n",
                "<details>\n",
                "    <summary>Click to learn more about compute types</summary>\n",
                "    \n",
                "[Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created within your workspace region and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user.\n",
                "\n",
                "Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service.\n",
                "\n",
                "You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
                "\n",
                "In addition to vm_size and max_nodes, you can specify:\n",
                "- **min_nodes**: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n",
                "- **vm_priority**: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
                "- **idle_seconds_before_scaledown**: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
                "- **vnet_resourcegroup_name**: Resource group of the existing VNet within which AmlCompute should be provisioned\n",
                "- **vnet_name**: Name of VNet\n",
                "- **subnet_name**: Name of SubNet within the VNet\n",
                "</details>\n",
                "---"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Found existing compute target\n"
                    ]
                }
            ],
            "source": [
                "# Remote compute (cluster) configuration. If you want to save the cost more, set these to small.\n",
                "VM_SIZE = 'STANDARD_D2_V2'\n",
                "# Cluster nodes\n",
                "MIN_NODES = 0\n",
                "MAX_NODES = 2\n",
                "\n",
                "CLUSTER_NAME = 'cpucluster'\n",
                "\n",
                "try:\n",
                "    compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)\n",
                "    print(\"Found existing compute target\")\n",
                "except:\n",
                "    print(\"Creating a new compute target...\")\n",
                "    # Specify the configuration for the new cluster\n",
                "    compute_config = AmlCompute.provisioning_configuration(\n",
                "        vm_size=VM_SIZE,\n",
                "        min_nodes=MIN_NODES,\n",
                "        max_nodes=MAX_NODES\n",
                "    )\n",
                "    # Create the cluster with the specified name and configuration\n",
                "    compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)\n",
                "    # Wait for the cluster to complete, show the output log\n",
                "    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "This example uses persistent compute target, which requires creation of the computing resource for first time. Alternatively, you can use run-based compute target or remote virtual machines."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {
                "scrolled": true
            },
            "outputs": [],
            "source": [
                "NOTEBOOK_NAME = 'sar_movielens.ipynb' # use different notebook file name if trying to run other notebooks\n",
                "experiment_name = NOTEBOOK_NAME.strip(\".ipynb\")\n",
                "tmp_dir = TemporaryDirectory()\n",
                "notebook_path = path.join(tmp_dir.name, NOTEBOOK_NAME)\n",
                "source_path = path.join('00_quick_start/', NOTEBOOK_NAME)\n",
                "shutil.copy(source_path, notebook_path)\n",
                "        \n",
                "exp = Experiment(workspace=ws, name=experiment_name)\n",
                "\n",
                "run_config = RunConfiguration()\n",
                "run_config.target = \"cpucluster\" # possible to use alternative compute targets\n",
                "run_config.environment.docker.enabled = True\n",
                "run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n",
                "run_config.environment.python.user_managed_dependencies = False\n",
                "run_config.auto_prepare_environment = True\n",
                "run_config.environment.python.conda_dependencies = CondaDependencies.create(python_version='3.6.2')\n",
                "run_config.environment.python.conda_dependencies.add_pip_package('sklearn')\n",
                "\n",
                "cfg = NotebookRunConfig(source_directory='../',\n",
                "                            notebook='notebooks/00_quick_start/' + NOTEBOOK_NAME,\n",
                "                            output_notebook='outputs/out.ipynb',\n",
                "                            parameters={\"MOVIELENS_DATA_SIZE\": \"100k\", \"TOP_K\": 10},\n",
                "                            run_config=run_config)\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "`exp.submit()` will submit source_directory folder and designate notebook to run"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "scrolled": true
            },
            "outputs": [],
            "source": [
                "run = exp.submit(cfg)\n",
                "run"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Above cell should output similar table as below\n",
                "![Experiment submit output](https://recodatasets.z20.web.core.windows.net/images/aml_papermill_cell_output.jpg)\n",
                "After clicking \"Link to Azure Portal\", experiment run details tab looks like this\n",
                "![Azure Portal Experiment](https://recodatasets.z20.web.core.windows.net/images/aml_papermill_portal.jpg)\n",
                "You can find the executed notebook out.ipynb under outputs tab.\n",
                "![Output file](https://recodatasets.z20.web.core.windows.net/images/aml_papermill_outputs_tab.jpg)\n",
                "#### Jupyter widget\n",
                "\n",
                "You can use a Jupyter widget to monitor the progress of the run. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "application/vnd.jupyter.widget-view+json": {
                            "model_id": "4cdf15aac31d48879cc6fb0021ed51ce",
                            "version_major": 2,
                            "version_minor": 0
                        },
                        "text/plain": [
                            "_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…"
                        ]
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": [
                "RunDetails(run).show()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "You can view logged metrics with run.get_metrics()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "{'MOVIELENS_DATA_SIZE': '100k', 'TOP_K': 10, 'map': 0.10485162234730652, 'ndcg': 0.3704872048540983, 'precision': 0.3211252653927813, 'recall': 0.17416985459670545, 'test_time': 0.15025854110717773, 'train_time': 0.29204869270324707}\n"
                    ]
                }
            ],
            "source": [
                "# run below after run is complete, otherwise metrics is empty\n",
                "metrics = run.get_metrics()\n",
                "print(metrics)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Deprovision compute resource\n",
                "To avoid unnecessary charges, if you created compute target that doesn't scale down to 0, make sure the compute target is deprovisioned after use."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {},
            "outputs": [],
            "source": [
                "# delete () is used to deprovision and delete the AmlCompute target. \n",
                "# do not run below before experiment completes\n",
                "\n",
                "# compute_target.delete()\n",
                "\n",
                "# deletion will take a few minutes. You can check progress in Azure Portal / Computing tab"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {},
            "outputs": [],
            "source": [
                "# clean up temporary directory\n",
                "tmp_dir.cleanup()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3.6 - AzureML",
            "language": "python",
            "name": "python3-azureml"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.6.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}