recommenders/examples/run_notebook_on_azureml.ipynb

384 строки
22 KiB
Plaintext
Исходник Постоянная ссылка Обычный вид История

2019-03-26 10:20:40 +03:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<i>Copyright (c) Recommenders contributors.</i>\n",
"\n",
"<i>Licensed under the MIT License.</i>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Submit an Existing Notebook to AzureML\n",
"## Introduction to Azure Machine Learning \n",
"The **[Azure Machine Learning service (AzureML)](https://docs.microsoft.com/azure/machine-learning/service/overview-what-is-azure-ml)** provides a cloud-based environment you can use to prep data, train, test, deploy, manage, and track machine learning models. By using Azure Machine Learning service, you can start training on your local machine and then scale out to the cloud. With many available compute targets, like [Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) and [Azure Databricks](https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks), and with [advanced hyperparameter tuning services](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), you can build better models faster by using the power of the cloud.\n",
"\n",
"Data scientists and AI developers use the main [Azure Machine Learning Python SDK](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py) to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks or your favorite Python IDE. The Azure Machine Learning SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.\n",
"![AzureML Workflow](https://docs.microsoft.com/en-us/azure/machine-learning/service/media/overview-what-is-azure-ml/aml.png)\n",
"\n",
"This noteboook provides a scaffold to directly submit an existing notebook to AzureML compute targets. Now a user doesn't have to rewrite the training script, instead, just by replacing file name, now user can submit notebook directly.\n",
"\n",
"### Advantages of using AzureML:\n",
"- Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.\n",
"- Train models either locally or by using cloud resources, including GPU-accelerated model training.\n",
"- Easy to scale out when dataset grows - by just creating and pointing to new compute target\n",
"\n",
"### Prerequisities\n",
" - **Azure Subscription**\n",
" - If you dont have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service today](https://azure.microsoft.com/en-us/free/services/machine-learning/).\n",
" - You get credits to spend on Azure services, which will easily cover the cost of running this example notebook. After they're used up, you can keep the account and use [free Azure services](https://azure.microsoft.com/en-us/free/). Your credit card is never charged unless you explicitly change your settings and ask to be charged. Or [activate MSDN subscriber benefits](https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/), which give you credits every month that you can use for paid Azure services.\n",
"\n",
"# Setup environment\n",
"\n",
"### Install azure.contrib.notebook package\n",
"We need notebook_run_config from azureml.contrib.notebook to run this notebook. Since azureml.contrib.notebook contains experimental components, it's not included in generated .yaml file by default."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#!pip install \"azureml.contrib.notebook>=1.0.21.1\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"azureml.core version: 1.0.23\n"
]
}
],
"source": [
"import os\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"from os import path, makedirs\n",
"import shutil\n",
"from azureml.core import Experiment\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.widgets import RunDetails\n",
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.contrib.notebook import NotebookRunConfig, PapermillExecutionHandler\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from tempfile import TemporaryDirectory\n",
"\n",
"print(\"azureml.core version: {}\".format(azureml.core.VERSION))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to an AzureML workspace\n",
"\n",
"An [AzureML Workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models.\n",
"\n",
"The function below will get or create an AzureML Workspace and save the configuration to `aml_config/config.json`.\n",
"\n",
"It defaults to use provided input parameters or environment variables for the Workspace configuration values. Otherwise, it will use an existing configuration file (either at `./aml_config/config.json` or a path specified by the config_path parameter).\n",
"\n",
"Lastly, if the workspace does not exist, one will be created for you. See [this tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace#portal) to locate information such as subscription id."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.create(\n",
" name=\"<WORKSPACE_NAME>\",\n",
" subscription_id=\"<SUBSCRIPTION_ID>\",\n",
" resource_group=\"<RESOURCE_GROUP>\",\n",
" location=\"<WORKSPACE_REGION>\"\n",
" exist_ok=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example uses run-based AzureML Compute, which requires minimum setup. Alternatively, you can use persistent compute target (cpu or gpu cluster) or remote virtual machines. For more details, see [How to set up training targets](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets).\n",
"\n",
"Cost-wise, according to Azure [Pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator/), with example VM size `STANDARD_D2_V2`, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact [Azure support](https://azure.microsoft.com/en-us/support/options/).\n",
"\n",
"**Note**:\n",
"- See list of all available VM sizes [here](https://docs.microsoft.com/en-us/azure/templates/Microsoft.Compute/2018-10-01/virtualMachines?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#hardwareprofile-object).\n",
"- As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request) on the default limits and how to request more quota.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create or Attach Azure Machine Learning Compute \n",
"\n",
"We create a cpu cluster as our **remote compute target**. If a cluster with the same name already exists in your workspace, the script will load it instead. You can read [Set up compute targets for model training](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up compute target on different locations. You can also create GPU machines when larger machines are necessary to train the model.\n",
"\n",
"According to Azure [Pricing calculator](https://azure.microsoft.com/en-us/pricing/calculator/), with example VM size `STANDARD_D2_V2`, it costs a few dollars to run this notebook, which is well covered by Azure new subscription credit. For billing and pricing questions, please contact [Azure support](https://azure.microsoft.com/en-us/support/options/).\n",
"\n",
"**Note**:\n",
"- 10m and 20m dataset requires more capacity than `STANDARD_D2_V2`, such as `STANDARD_NC6` or `STANDARD_NC12`. See list of all available VM sizes [here](https://docs.microsoft.com/en-us/azure/templates/Microsoft.Compute/2018-10-01/virtualMachines?toc=%2Fen-us%2Fazure%2Fazure-resource-manager%2Ftoc.json&bc=%2Fen-us%2Fazure%2Fbread%2Ftoc.json#hardwareprofile-object).\n",
"- As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request) on the default limits and how to request more quota.\n",
"---\n",
"#### Learn more about Azure Machine Learning Compute\n",
"<details>\n",
" <summary>Click to learn more about compute types</summary>\n",
" \n",
"[Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created within your workspace region and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user.\n",
"\n",
"Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service.\n",
"\n",
"You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
"\n",
"In addition to vm_size and max_nodes, you can specify:\n",
"- **min_nodes**: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n",
"- **vm_priority**: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
"- **idle_seconds_before_scaledown**: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
"- **vnet_resourcegroup_name**: Resource group of the existing VNet within which AmlCompute should be provisioned\n",
"- **vnet_name**: Name of VNet\n",
"- **subnet_name**: Name of SubNet within the VNet\n",
"</details>\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found existing compute target\n"
]
}
],
"source": [
"# Remote compute (cluster) configuration. If you want to save the cost more, set these to small.\n",
"VM_SIZE = 'STANDARD_D2_V2'\n",
"# Cluster nodes\n",
"MIN_NODES = 0\n",
"MAX_NODES = 2\n",
"\n",
"CLUSTER_NAME = 'cpucluster'\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=CLUSTER_NAME)\n",
" print(\"Found existing compute target\")\n",
"except:\n",
" print(\"Creating a new compute target...\")\n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(\n",
" vm_size=VM_SIZE,\n",
" min_nodes=MIN_NODES,\n",
" max_nodes=MAX_NODES\n",
" )\n",
" # Create the cluster with the specified name and configuration\n",
" compute_target = ComputeTarget.create(ws, CLUSTER_NAME, compute_config)\n",
" # Wait for the cluster to complete, show the output log\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example uses persistent compute target, which requires creation of the computing resource for first time. Alternatively, you can use run-based compute target or remote virtual machines."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"NOTEBOOK_NAME = 'sar_movielens.ipynb' # use different notebook file name if trying to run other notebooks\n",
"experiment_name = NOTEBOOK_NAME.strip(\".ipynb\")\n",
"tmp_dir = TemporaryDirectory()\n",
"notebook_path = path.join(tmp_dir.name, NOTEBOOK_NAME)\n",
"source_path = path.join('00_quick_start/', NOTEBOOK_NAME)\n",
"shutil.copy(source_path, notebook_path)\n",
" \n",
"exp = Experiment(workspace=ws, name=experiment_name)\n",
"\n",
"run_config = RunConfiguration()\n",
"run_config.target = \"cpucluster\" # possible to use alternative compute targets\n",
"run_config.environment.docker.enabled = True\n",
"run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"run_config.auto_prepare_environment = True\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(python_version='3.6.2')\n",
"run_config.environment.python.conda_dependencies.add_pip_package('sklearn')\n",
"\n",
"cfg = NotebookRunConfig(source_directory='../',\n",
" notebook='notebooks/00_quick_start/' + NOTEBOOK_NAME,\n",
" output_notebook='outputs/out.ipynb',\n",
" parameters={\"MOVIELENS_DATA_SIZE\": \"100k\", \"TOP_K\": 10},\n",
" run_config=run_config)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`exp.submit()` will submit source_directory folder and designate notebook to run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"run = exp.submit(cfg)\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above cell should output similar table as below\n",
"![Experiment submit output](https://recodatasets.z20.web.core.windows.net/images/aml_papermill_cell_output.jpg)\n",
"After clicking \"Link to Azure Portal\", experiment run details tab looks like this\n",
"![Azure Portal Experiment](https://recodatasets.z20.web.core.windows.net/images/aml_papermill_portal.jpg)\n",
"You can find the executed notebook out.ipynb under outputs tab.\n",
"![Output file](https://recodatasets.z20.web.core.windows.net/images/aml_papermill_outputs_tab.jpg)\n",
"#### Jupyter widget\n",
"\n",
"You can use a Jupyter widget to monitor the progress of the run. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4cdf15aac31d48879cc6fb0021ed51ce",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"RunDetails(run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can view logged metrics with run.get_metrics()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'MOVIELENS_DATA_SIZE': '100k', 'TOP_K': 10, 'map': 0.10485162234730652, 'ndcg': 0.3704872048540983, 'precision': 0.3211252653927813, 'recall': 0.17416985459670545, 'test_time': 0.15025854110717773, 'train_time': 0.29204869270324707}\n"
]
}
],
"source": [
"# run below after run is complete, otherwise metrics is empty\n",
"metrics = run.get_metrics()\n",
"print(metrics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deprovision compute resource\n",
"To avoid unnecessary charges, if you created compute target that doesn't scale down to 0, make sure the compute target is deprovisioned after use."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# delete () is used to deprovision and delete the AmlCompute target. \n",
"# do not run below before experiment completes\n",
"\n",
"# compute_target.delete()\n",
"\n",
"# deletion will take a few minutes. You can check progress in Azure Portal / Computing tab"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# clean up temporary directory\n",
"tmp_dir.cleanup()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.6 - AzureML",
"language": "python",
"name": "python3-azureml"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
Gramhagen/dep overhaul (#1346) * updating tests and dependencies for setup * fix typo in setup.py deps * fix locust version typo in setup.py deps * fix setup and update readme for reco-utils * fixing tests * fix tests on windows with sar dtype and mind_util extraction * fix integration test and fastai notebook model cleanup * test unit_linux_cpu test * trying source activate for conda env * new conda init in unit test * skip collection errors on unit tests * fix jinja dependency and missing init file in nni * typo in test yaml * fix jinja version * adding missing nni init file * try catch to fix pytest import errors * fixing negative sampler error * Change /mnt to temp dir in spark fixture * Fix bug in fastai notebook * merging recommenders into core requirements and updating test and docs * black formatting updating ci unit tests * issues with gpu and spark unit test pipeline * update spark and gpu unit test pipelines * updating notebook ci tests * fix tensorflow dependence on cudatoolkit==10.0 * adding missing init file for deeprec * adding missing inits * testing test parameterization * fix type on test yaml * next step in templating ci test * typo in template * fix indentation in test template * extending templates for gpu and spark * extending templates for nightly tests * adding missing job declaration in test * cleanup previous tests before reporting * fixing nightly pipelines * fix test marker for smoke/integration * initializing conda * bash typo * wrong pytest arg * catch import errors in smoke tests adding xlearn to test functionality * moving xlearn integration test to gpu * fix pandas version change for nni utils * refactoring shared dependencies in setup, fixing vw * small typos and fix support on windows * fixing torch version for windows compatability, moving aks_utils -> common.k8s_utils, cleanup dependencies, move test -> separate dev-requirements.txt * fix ci pipelines to use dev-requirements.txt * updating hyperdrive notebooks to use local train_scripts dir in examples instead of reco_utils.azureml dir * typo in setup.py * Adding dev reqmts file * removing unused dependency and skipping notebook imports in tests if not appropriate * fix notebook test import error * fixing formatting and adding note about xlearn gpu marker * removing covid sas token * Update azureml_hyperdrive_surprise_svd.ipynb Typo in scripts directory name. * Update README.md * removing experimental option and fix typo * use temp dir Co-authored-by: Andreas Argyriou <anargyri@microsoft.com> Co-authored-by: Andreas Argyriou <anargyri@users.noreply.github.com>
2021-03-26 16:19:48 +03:00
}