* update for serverless compute

* run black
This commit is contained in:
Sheri Gilley 2023-11-14 09:10:02 -06:00 коммит произвёл GitHub
Родитель 628fdb9680
Коммит 1ad4a1292b
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
3 изменённых файлов: 71 добавлений и 228 удалений

Просмотреть файл

@ -105,14 +105,17 @@
"\n",
"# authenticate\n",
"credential = DefaultAzureCredential()\n",
"# # Get a handle to the workspace\n",
"\n",
"SUBSCRIPTION = \"<SUBSCRIPTION_ID>\"\n",
"RESOURCE_GROUP = \"<RESOURCE_GROUP>\"\n",
"WS_NAME = \"<AML_WORKSPACE_NAME>\"\n",
"# Get a handle to the workspace\n",
"ml_client = MLClient(\n",
" credential=credential,\n",
" subscription_id=\"<SUBSCRIPTION_ID>\",\n",
" resource_group_name=\"<RESOURCE_GROUP>\",\n",
" workspace_name=\"<AML_WORKSPACE_NAME>\",\n",
")\n",
"cpu_cluster = None"
" subscription_id=SUBSCRIPTION,\n",
" resource_group_name=RESOURCE_GROUP,\n",
" workspace_name=WS_NAME,\n",
")"
]
},
{
@ -121,8 +124,27 @@
"metadata": {},
"source": [
"> [!NOTE]\n",
"> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen when creating the `credit_data` data asset, two code cells from here).\n",
"> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell).\n",
"\n",
"Verify the connection by making a call to `ml_client`. Since this is the first time that you're making a call to the workspace, you may be asked to authenticate. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify that the handle works correctly.\n",
"# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.\n",
"ws = ml_client.workspaces.get(WS_NAME)\n",
"print(ws.location, \":\", ws.resource_group)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Access the registered data asset\n",
"\n",
"Start by getting the data that you previously registered in [Tutorial: Upload, access and explore your data](explore-data.ipynb).\n",
@ -130,14 +152,6 @@
"* Azure Machine Learning uses a `Data` object to register a reusable definition of data, and consume data within a pipeline."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Since this is the first time that you're making a call to the workspace, you may be asked to authenticate. Once the authentication is complete, you then see the dataset registration completion message."
]
},
{
"cell_type": "code",
"execution_count": null,
@ -157,72 +171,6 @@
"print(f\"Data asset URI: {credit_data.path}\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a compute resource to run your pipeline (Optional)\n",
"\n",
"You can **skip this step** if you want to use [serverless compute (preview)](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) to run the training job. Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you. \n",
"\n",
"Each step of an Azure Machine Learning pipeline can use a different compute resource for running the specific job of that step. It can be single or multi-node machines with Linux or Windows OS, or a specific compute fabric like Spark.\n",
"\n",
"In this section, you provision a Linux [compute cluster](https://docs.microsoft.com/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python). See the [full list on VM sizes and prices](https://azure.microsoft.com/en-ca/pricing/details/machine-learning/) .\n",
"\n",
"For this tutorial, you only need a basic cluster so use a Standard_DS3_v2 model with 2 vCPU cores, 7-GB RAM and create an Azure Machine Learning Compute.\n",
"> [!TIP]\n",
"> If you already have a compute cluster, replace \"cpu-cluster\" in the next code block with the name of your cluster. This will keep you from creating another one.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"name": "cpu_cluster"
},
"outputs": [],
"source": [
"from azure.ai.ml.entities import AmlCompute\n",
"\n",
"# Name assigned to the compute cluster\n",
"cpu_compute_target = \"cpu-cluster\"\n",
"\n",
"try:\n",
" # let's see if the compute target already exists\n",
" cpu_cluster = ml_client.compute.get(cpu_compute_target)\n",
" print(\n",
" f\"You already have a cluster named {cpu_compute_target}, we'll reuse it as is.\"\n",
" )\n",
"\n",
"except Exception:\n",
" print(\"Creating a new cpu compute target...\")\n",
"\n",
" # Let's create the Azure Machine Learning compute object with the intended parameters\n",
" # if you run into an out of quota error, change the size to a comparable VM that is available.\n",
" # Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-learning/.\n",
" cpu_cluster = AmlCompute(\n",
" name=cpu_compute_target,\n",
" # Azure Machine Learning Compute is the on-demand VM service\n",
" type=\"amlcompute\",\n",
" # VM Family\n",
" size=\"STANDARD_DS3_V2\",\n",
" # Minimum running nodes when there is no job running\n",
" min_instances=0,\n",
" # Nodes in cluster\n",
" max_instances=4,\n",
" # How many seconds will the node running after the job termination\n",
" idle_time_before_scale_down=180,\n",
" # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination\n",
" tier=\"Dedicated\",\n",
" )\n",
" print(\n",
" f\"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}\"\n",
" )\n",
" # Now, we pass the object to MLClient's create_or_update method\n",
" cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)"
]
},
{
"attachments": {},
"cell_type": "markdown",
@ -829,9 +777,7 @@
"\n",
"\n",
"@dsl.pipeline(\n",
" compute=cpu_compute_target\n",
" if (cpu_cluster)\n",
" else \"serverless\", # \"serverless\" value runs pipeline on serverless compute\n",
" compute=\"serverless\", # \"serverless\" value runs pipeline on serverless compute\n",
" description=\"E2E data_perp-train pipeline\",\n",
")\n",
"def credit_defaults_pipeline(\n",

Просмотреть файл

@ -17,7 +17,6 @@
"\n",
"> * Set up a handle to your Azure Machine Learning workspace\n",
"> * Create your training script\n",
"> * Create a scalable compute resource, a compute cluster or use [serverless compute (preview)](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) instead\n",
"> * Create and run a command job that will run the training script on the compute cluster, configured with the appropriate job environment\n",
"> * View the output of your training script\n",
"> * Deploy the newly-trained model as an endpoint\n",
@ -90,14 +89,16 @@
"# authenticate\n",
"credential = DefaultAzureCredential()\n",
"\n",
"SUBSCRIPTION = \"<SUBSCRIPTION_ID>\"\n",
"RESOURCE_GROUP = \"<RESOURCE_GROUP>\"\n",
"WS_NAME = \"<AML_WORKSPACE_NAME>\"\n",
"# Get a handle to the workspace\n",
"ml_client = MLClient(\n",
" credential=credential,\n",
" subscription_id=\"<SUBSCRIPTION_ID>\",\n",
" resource_group_name=\"<RESOURCE_GROUP>\",\n",
" workspace_name=\"<AML_WORKSPACE_NAME>\",\n",
")\n",
"cpu_cluster = None"
" subscription_id=SUBSCRIPTION,\n",
" resource_group_name=RESOURCE_GROUP,\n",
" workspace_name=WS_NAME,\n",
")"
]
},
{
@ -106,7 +107,19 @@
"metadata": {},
"source": [
"> [!NOTE]\n",
"> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (in this notebook, that will happen in the cell that creates the compute cluster)."
"> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Verify that the handle works correctly.\n",
"# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.\n",
"ws = ml_client.workspaces.get(WS_NAME)\n",
"print(ws.location, \":\", ws.resource_group)"
]
},
{
@ -270,63 +283,7 @@
"\n",
"You might need to select **Refresh** to see the new folder and script in your **Files**.\n",
"\n",
"![refresh](./media/refresh.png)\n",
"\n",
"## Create a compute cluster, a scalable way to run a training job (Optional)\n",
"\n",
"You can **skip this step** if you want to use **serverless compute** to run the training job. Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you. \n",
"\n",
"You already have a compute instance, which you're using to run the notebook. Now you'll add a second type of compute, a **compute cluster** that you'll use to run your training job. While a compute instance is a single node machine, a compute cluster can be single or multi-node machines with Linux or Windows OS, or a specific compute fabric like Spark.\n",
"\n",
"You'll provision a Linux compute cluster. See the [full list on VM sizes and prices](https://azure.microsoft.com/pricing/details/machine-learning/) .\n",
"\n",
"For this example, you only need a basic cluster, so you'll use a Standard_DS3_v2 model with 2 vCPU cores, 7-GB RAM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.ml.entities import AmlCompute\n",
"\n",
"# Name assigned to the compute cluster\n",
"cpu_compute_target = \"cpu-cluster\"\n",
"\n",
"try:\n",
" # let's see if the compute target already exists\n",
" cpu_cluster = ml_client.compute.get(cpu_compute_target)\n",
" print(\n",
" f\"You already have a cluster named {cpu_compute_target}, we'll reuse it as is.\"\n",
" )\n",
"\n",
"except Exception:\n",
" print(\"Creating a new cpu compute target...\")\n",
"\n",
" # Let's create the Azure Machine Learning compute object with the intended parameters\n",
" # if you run into an out of quota error, change the size to a comparable VM that is available.\n",
" # Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-learning/.\n",
" cpu_cluster = AmlCompute(\n",
" name=cpu_compute_target,\n",
" # Azure Machine Learning Compute is the on-demand VM service\n",
" type=\"amlcompute\",\n",
" # VM Family\n",
" size=\"STANDARD_DS3_V2\",\n",
" # Minimum running nodes when there is no job running\n",
" min_instances=0,\n",
" # Nodes in cluster\n",
" max_instances=4,\n",
" # How many seconds will the node running after the job termination\n",
" idle_time_before_scale_down=180,\n",
" # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination\n",
" tier=\"Dedicated\",\n",
" )\n",
" print(\n",
" f\"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}\"\n",
" )\n",
" # Now, we pass the object to MLClient's create_or_update method\n",
" cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)"
"![refresh](./media/refresh.png)"
]
},
{
@ -339,10 +296,10 @@
"Now that you have a script that can perform the desired tasks, and a compute cluster to run the script, you'll use a general purpose **command** that can run command line actions. This command line action can directly call system commands or run a script. \n",
"\n",
"Here, you'll create input variables to specify the input data, split ratio, learning rate and registered model name. The command script will:\n",
"* Use the compute cluster to run the command if you created it or use serverless compute by not specifying any compute.\n",
"* Use an *environment* that defines software and runtime libraries needed for the training script. Azure Machine Learning provides many curated or ready-made environments, which are useful for common training and inference scenarios. You'll use one of those environments here. In the [Train a model](train-model.ipynb) tutorial, you'll learn how to create a custom environment. \n",
"* Configure the command line action itself - `python main.py` in this case. The inputs/outputs are accessible in the command via the `${{ ... }}` notation.\n",
"* In this sample, we access the data from a file on the internet. "
"* In this sample, we access the data from a file on the internet. \n",
"* Since a compute resource was not specified, the script will be run on a [serverless compute cluster](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) that is automatically created."
]
},
{
@ -374,9 +331,6 @@
" code=\"./src/\", # location of source code\n",
" command=\"python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}\",\n",
" environment=\"AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest\",\n",
" compute=\"cpu-cluster\"\n",
" if (cpu_cluster)\n",
" else None, # No compute needs to be passed to use serverless\n",
" display_name=\"credit_default_prediction\",\n",
")"
]

Просмотреть файл

@ -14,7 +14,7 @@
"The steps are:\n",
"\n",
" * Get a handle to your Azure Machine Learning workspace\n",
" * Create your compute resource (or simply use [serverless compute (preview)](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) instead) and job environment\n",
" * Create a job environment\n",
" * Create your training script\n",
" * Create and run your command job to run the training script on the compute resource, configured with the appropriate job environment and the data source\n",
" * View the output of your training script\n",
@ -107,14 +107,17 @@
"\n",
"# authenticate\n",
"credential = DefaultAzureCredential()\n",
"# # Get a handle to the workspace\n",
"\n",
"SUBSCRIPTION = \"<SUBSCRIPTION_ID>\"\n",
"RESOURCE_GROUP = \"<RESOURCE_GROUP>\"\n",
"WS_NAME = \"<AML_WORKSPACE_NAME>\"\n",
"# Get a handle to the workspace\n",
"ml_client = MLClient(\n",
" credential=credential,\n",
" subscription_id=\"<SUBSCRIPTION_ID>\",\n",
" resource_group_name=\"<RESOURCE_GROUP>\",\n",
" workspace_name=\"<AML_WORKSPACE_NAME>\",\n",
")\n",
"cpu_cluster = None"
" subscription_id=SUBSCRIPTION,\n",
" resource_group_name=RESOURCE_GROUP,\n",
" workspace_name=WS_NAME,\n",
")"
]
},
{
@ -126,73 +129,16 @@
"> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a compute cluster to run your job (optional) \n",
"\n",
"You can **skip this step** if you want to use [serverless compute (preview)](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) to run the training job. Through serverless compute, Azure Machine Learning takes care of creating, scaling, deleting, patching and managing compute, along with providing managed network isolation, reducing the burden on you. \n",
"\n",
"In Azure, a job can refer to several tasks that Azure allows its users to do: training, pipeline creation, deployment, etc. For this tutorial and our purpose of training a machine learning model, we'll use *job* as a reference to running training computations (*training job*).\n",
"\n",
"You need a compute resource for running any job in Azure Machine Learning. It can be single or multi-node machines with Linux or Windows OS, or a specific compute fabric like Spark. In Azure Machine Learning, there are two compute resources that you can choose from: instance and cluster. A compute instance contains one node of computation resources while a *compute cluster* contains several. A *compute cluster* contains more memory for the computation task. For training, we recommend using a compute cluster because it allows the user to distribute calculations on multiple nodes of computation, which results in a faster training experience. \n",
"\n",
"You provision a Linux compute cluster. See the [full list on VM sizes and prices](https://azure.microsoft.com/pricing/details/machine-learning/) .\n",
"\n",
"For this example, you only need a basic cluster, so you use a Standard_DS3_v2 model with 2 vCPU cores, 7-GB RAM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1677262287630
},
"name": "cpu_compute_target"
},
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.ml.entities import AmlCompute\n",
"\n",
"# Name assigned to the compute cluster\n",
"cpu_compute_target = \"cpu-cluster\"\n",
"\n",
"try:\n",
" # let's see if the compute target already exists\n",
" cpu_cluster = ml_client.compute.get(cpu_compute_target)\n",
" print(\n",
" f\"You already have a cluster named {cpu_compute_target}, we'll reuse it as is.\"\n",
" )\n",
"\n",
"except Exception:\n",
" print(\"Creating a new cpu compute target...\")\n",
"\n",
" # Let's create the Azure Machine Learning compute object with the intended parameters\n",
" # if you run into an out of quota error, change the size to a comparable VM that is available.\n",
" # Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-learning/.\n",
" cpu_cluster = AmlCompute(\n",
" name=cpu_compute_target,\n",
" # Azure Machine Learning Compute is the on-demand VM service\n",
" type=\"amlcompute\",\n",
" # VM Family\n",
" size=\"STANDARD_DS3_V2\",\n",
" # Minimum running nodes when there is no job running\n",
" min_instances=0,\n",
" # Nodes in cluster\n",
" max_instances=4,\n",
" # How many seconds will the node running after the job termination\n",
" idle_time_before_scale_down=180,\n",
" # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination\n",
" tier=\"Dedicated\",\n",
" )\n",
" print(\n",
" f\"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}\"\n",
" )\n",
" # Now, we pass the object to MLClient's create_or_update method\n",
" cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)"
"# Verify that the handle works correctly.\n",
"# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.\n",
"ws = ml_client.workspaces.get(WS_NAME)\n",
"print(ws.location, \":\", ws.resource_group)"
]
},
{
@ -257,7 +203,7 @@
" - pandas>=1.1,<1.2\n",
" - pip:\n",
" - inference-schema[numpy-support]==1.3.0\n",
" - mlflow==2.4.1\n",
" - mlflow== 2.4.1\n",
" - azureml-mlflow==1.51.0\n",
" - psutil>=5.8,<5.9\n",
" - tqdm>=4.59,<4.60\n",
@ -476,9 +422,9 @@
"Now that you have a script that can perform the classification task, use the general purpose **command** that can run command line actions. This command line action can be directly calling system commands or by running a script. \n",
"\n",
"Here, create input variables to specify the input data, split ratio, learning rate and registered model name. The command script will:\n",
"* Use the compute created earlier to run this command or simply use serverless compute.\n",
"* Use the environment created earlier - you can use the `@latest` notation to indicate the latest version of the environment when the command is run.\n",
"* Configure the command line action itself - `python main.py` in this case. The inputs/outputs are accessible in the command via the `${{ ... }}` notation."
"* Configure the command line action itself - `python main.py` in this case. The inputs/outputs are accessible in the command via the `${{ ... }}` notation.\n",
"* Since a compute resource was not specified, the script will be run on a [serverless compute cluster](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) that is automatically created."
]
},
{
@ -510,9 +456,6 @@
" code=\"./src/\", # location of source code\n",
" command=\"python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}\",\n",
" environment=\"aml-scikit-learn@latest\",\n",
" compute=\"cpu-cluster\"\n",
" if (cpu_cluster)\n",
" else None, # No compute needs to be passed to use serverless\n",
" display_name=\"credit_default_prediction\",\n",
")"
]