azureml-examples/tutorials/azureml-getting-started/azureml-getting-started-stu...

441 строка
18 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started: training an image classification model\n",
"\n",
"**Learning Objectives** - By the end of this quickstart tutorial, you'll know how to train and deploy an image classification model on Azure Machine Learning studio.\n",
"\n",
"This tutorial covers:\n",
"\n",
"- Connect to workspace & set up a compute resource on the Azure Machine Learning Studio Notebook UI\n",
"- Bring data in and prepare it to be used for training\n",
"- Train a model for image classification\n",
"- Metrics for optimizing your model\n",
"- Deploy the model online & test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Connect to Azure Machine Learning workspace\n",
"\n",
"Before we dive in the code, you'll need to connect to your workspace. The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning.\n",
"\n",
"We are using `DefaultAzureCredential` to get access to workspace. `DefaultAzureCredential` should be capable of handling most scenarios. If you want to learn more about other available credentials, go to [set up authentication doc](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk), [azure-identity reference doc](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).\n",
"\n",
"**Make sure to enter your workspace credentials before you run the script below.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Handle to the workspace\n",
"from azure.ai.ml import MLClient\n",
"\n",
"# Authentication package\n",
"from azure.identity import DefaultAzureCredential\n",
"\n",
"credential = DefaultAzureCredential()\n",
"\n",
"# Get a handle to the workspace. You can find the info on the workspace tab on ml.azure.com\n",
"ml_client = MLClient(\n",
" credential=credential,\n",
" subscription_id=\"<SUBSCRIPTION_ID>\", # this will look like xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\n",
" resource_group_name=\"<RESOURCE_GROUP>\",\n",
" workspace_name=\"<AML_WORKSPACE_NAME>\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Create compute\n",
"\n",
"In order to train a model on the Notebook editor on Azure Machine Learning studio, you will need to create a compute resource first. This is easily handled through a compute creation wizard. **Creating a compute will take 3-4 minutes.**\n",
"\n",
"![](media/compute-creation.png)\n",
"\n",
"1. Click **...** menu button on the top of Notebook UI, and select **+Create Azure ML Compute Instance**.\n",
"2. **Name** the compute as **cpu-cluster**\n",
"3. Select **CPU** and **STANDARD_DS3_V2**. \n",
"4. Click **Create**\n",
"\n",
"If you are interested in learning how to create compute via code, see [Azure Machine Learning in a Day](https://github.com/Azure/azureml-examples/blob/main/tutorials/azureml-in-a-day/azureml-in-a-day.ipynb). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Create a job environment\n",
"To run an Azure Machine Learning training job, you'll need an environment.\n",
"\n",
"In this tutorial, you'll using a ready-made environment called `AzureML-sklearn-1.5` that contains all required libraries (python, MLflow, numpy, pip, etc). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Build the command job to train\n",
"Now that you have all assets required to run your job, it's time to build the job itself, using the Azure ML Python SDK v2. We will be creating a command job.\n",
"\n",
"An AzureML command job is a resource that specifies all the details needed to execute your training code in the cloud: inputs and outputs, the type of hardware to use, software to install, and how to run your code. the command job contains information to execute a single command.\n",
"\n",
"**Create training script**\n",
"\n",
"Let's start by creating the training script - the *main.py* python file."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"train_src_dir = \"./src\"\n",
"os.makedirs(train_src_dir, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This script handles the preprocessing of the data, splitting it into test and train data. It then consumes this data to train a tree based model and return the output model. [MLFlow](https://mlflow.org/docs/latest/tracking.html) will be used to log the parameters and metrics during our pipeline run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile {train_src_dir}/main.py\n",
"import os\n",
"import argparse\n",
"import pandas as pd\n",
"import mlflow\n",
"import mlflow.sklearn\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"def main():\n",
" \"\"\"Main function of the script.\"\"\"\n",
"\n",
" # input and output arguments\n",
" parser = argparse.ArgumentParser()\n",
" parser.add_argument(\"--data\", type=str, help=\"path to input data\")\n",
" parser.add_argument(\"--test_train_ratio\", type=float, required=False, default=0.25)\n",
" parser.add_argument(\"--n_estimators\", required=False, default=100, type=int)\n",
" parser.add_argument(\"--learning_rate\", required=False, default=0.1, type=float)\n",
" parser.add_argument(\"--registered_model_name\", type=str, help=\"model name\")\n",
" args = parser.parse_args()\n",
" \n",
" # start Logging\n",
" mlflow.start_run()\n",
"\n",
" # enable autologging\n",
" mlflow.sklearn.autolog()\n",
"\n",
" ###################\n",
" #<prepare the data>\n",
" ###################\n",
" print(\" \".join(f\"{k}={v}\" for k, v in vars(args).items()))\n",
"\n",
" print(\"input data:\", args.data)\n",
" \n",
" credit_df = pd.read_csv(args.data, header=1, index_col=0)\n",
"\n",
" mlflow.log_metric(\"num_samples\", credit_df.shape[0])\n",
" mlflow.log_metric(\"num_features\", credit_df.shape[1] - 1)\n",
"\n",
" train_df, test_df = train_test_split(\n",
" credit_df,\n",
" test_size=args.test_train_ratio,\n",
" )\n",
" ###################\n",
" #</prepare the data>\n",
" ###################\n",
"\n",
" ##################\n",
" #<train the model>\n",
" ##################\n",
" # extracting the label column\n",
" y_train = train_df.pop(\"default payment next month\")\n",
"\n",
" # convert the dataframe values to array\n",
" X_train = train_df.values\n",
"\n",
" # extracting the label column\n",
" y_test = test_df.pop(\"default payment next month\")\n",
"\n",
" # convert the dataframe values to array\n",
" X_test = test_df.values\n",
"\n",
" print(f\"Training with data of shape {X_train.shape}\")\n",
"\n",
" clf = GradientBoostingClassifier(\n",
" n_estimators=args.n_estimators, learning_rate=args.learning_rate\n",
" )\n",
" clf.fit(X_train, y_train)\n",
"\n",
" y_pred = clf.predict(X_test)\n",
"\n",
" print(classification_report(y_test, y_pred))\n",
"\n",
" ##################\n",
" #</train the model>\n",
" ##################\n",
"\n",
" ##########################\n",
" #<save and register model>\n",
" ##########################\n",
" # registering the model to the workspace\n",
" print(\"Registering the model via MLFlow\")\n",
" mlflow.sklearn.log_model(\n",
" sk_model=clf,\n",
" registered_model_name=args.registered_model_name,\n",
" artifact_path=args.registered_model_name,\n",
" )\n",
"\n",
" # saving the model to a file\n",
" mlflow.sklearn.save_model(\n",
" sk_model=clf,\n",
" path=os.path.join(args.registered_model_name, \"trained_model\"),\n",
" )\n",
" \n",
" # stop Logging\n",
" mlflow.end_run()\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see in this script, once the model is trained, the model file is saved and registered to the workspace. Now you can use the registered model in inferencing endpoints."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Configure the Command**\n",
"\n",
"Now that you have a script that can perform the desired tasks, You'll use the general purpose command that can run command line actions. This command line action can be directly calling system commands or running a script.\n",
"\n",
"Here, you'll use input data, split ratio, learning rate and registered model name as input variables."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# import the libraries\n",
"from azure.ai.ml import command\n",
"from azure.ai.ml import Input\n",
"\n",
"# name the model you registered earlier in the training script\n",
"registered_model_name = \"credit_defaults_model\"\n",
"\n",
"# configure the command job\n",
"job = command(\n",
" inputs=dict(\n",
" # uri_file refers to a specific file as a data asset\n",
" data=Input(\n",
" type=\"uri_file\",\n",
" path=\"https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default%20of%20credit%20card%20clients.csv\",\n",
" ),\n",
" test_train_ratio=0.2, # input variable in main.py\n",
" learning_rate=0.25, # input variable in main.py\n",
" registered_model_name=registered_model_name, # input variable in main.py\n",
" ),\n",
" code=\"./src/\", # location of source code\n",
" # The inputs/outputs are accessible in the command via the ${{ ... }} notation\n",
" command=\"python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}\",\n",
" # This is the ready-made environment you are using\n",
" environment=\"azureml://registries/azureml/environments/sklearn-1.5/labels/latest\",\n",
" # This is the compute you created earlier. You can alternatively remove this line to use serverless compute to run the job\n",
" compute=\"cpu-cluster\",\n",
" # An experiment is a container for all the iterations one does on a certain project. All the jobs submitted under the same experiment name would be listed next to each other in Azure ML studio.\n",
" experiment_name=\"train_model_credit_default_prediction\",\n",
" display_name=\"credit_default_prediction\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Submit the job ###\n",
"It's now time to submit the job to run in AzureML. **The job will take 2 to 3 minutes to run**. It could take longer (up to 10 minutes) if the compute instance has been scaled down to zero nodes and custom environment is still building."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# submit the command job\n",
"ml_client.create_or_update(job)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. View the result of a training job"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](media/view-job.gif)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"You can view the result of a training job by **clicking the URL generated after submitting a job**. Alternatively, you can also click **Jobs** on the left navigation menu. A job is a grouping of many runs from a specified script or piece of code. Information for the run is stored under that job. \n",
"\n",
"1. **Overview** is where you can see the status of the job. \n",
"2. **Metrics** would display different visualizations of the metrics you specified in the script.\n",
"3. **Images** is where you can view any image artifacts that you have logged with MLflow.\n",
"4. **Child jobs** contains child jobs if you added them.\n",
"5. **Outputs + logs** contains log files you need for troubleshooting or other monitoring purposes. \n",
"6. **Code** contains the script/code used in the job.\n",
"7. **Explanations** and **Fairness** are used to see how your model performs against responsible AI standards. They are currently preview features and require additional package installations.\n",
"8. **Monitoring** is where you can view metrics for the performance of compute resources. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. Deploy the model as an online endpoint\n",
"\n",
"Now deploy your machine learning model as a web service in the Azure cloud, an [`online endpoint`](https://docs.microsoft.com/azure/machine-learning/concept-endpoints).\n",
"\n",
"To deploy a machine learning service, you usually need:\n",
"\n",
"- The model assets (file, metadata) that you want to deploy. You've already registered these assets via MLflow in *main.py*. You can find it under **Models** on the left navigation menu on Azure Machine Learning studio. \n",
"- The code that executes the model on a given input request. In this quickstart, you can easily set it up through the endpoint creation UI. If you want to learn more about how to deploy via Azure Machine Learning SDK, see [Azure Machine Learning in a Day](https://github.com/Azure/azureml-examples/blob/main/tutorials/azureml-in-a-day/azureml-in-a-day.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](media/endpoint-creation.gif)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Find the endpoint creation wizard on Studio**\n",
"1. Open a duplicate tab (so that you can keep this tutorial open).\n",
"1. On the duplicate tab, select **Endpoints** on the left navigation menu.\n",
"2. Select **+Create** for real-time endpoints.\n",
"\n",
"**Endpoint creation & deployment via wizard UI** (this will take approximately 6 to 8 minutes)\n",
"1. Enter a **unique name** for *endpoint name*. We recommend creating a *unique* name with current date/time to avoid conflicts, which could prevent your deployment. Keep all the defaults for the rest. \n",
"2. Next, you need to choose a model to deploy. Select **credit_defaults_model** registered by *main.py* earlier. \n",
"3. Keep all the defaults for deployment configuration.\n",
"1. Select **Standard_DS3_V2** for compute, which is what we configured earlier. Set the instance count to **1**.\n",
"1. Keep all the defaults for the traffic.\n",
"1. Review: review and select **Create**. \n",
"\n",
"![](media/endpoint-test.gif)\n",
"\n",
"**Test with a sample query**\n",
"1. Select the endpoint you just created. Make sure the endpoint is created and the model has been deployed to it.\n",
"2. Select the **Test** tab.\n",
"3. Copy & paste the following sample request file into the **Input data to test real-time endpoint** field.\n",
"4. Select **Test**. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"{\n",
" \"input_data\": {\n",
" \"columns\": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],\n",
" \"index\": [0, 1],\n",
" \"data\": [\n",
" [20000,2,2,1,24,2,2,-1,-1,-2,-2,3913,3102,689,0,0,0,0,689,0,0,0,0],\n",
" [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 10, 9, 8]\n",
" ]\n",
" }\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Clean up resources**\n",
"\n",
"If you're not going to use the endpoint, delete it to stop using the resource. Make sure no other deployments are using an endpoint before you delete it.\n",
"\n",
"1. Click **Details** on the endpoint page.\n",
"2. Click the **Delete** button.\n",
"\n",
"**Expect this step to take approximately 6 to 8 minutes.**"
]
}
],
"metadata": {
"description": {
"description": "A quickstart tutorial to train and deploy an image classification model on Azure Machine Learning studio"
},
"kernelspec": {
"display_name": "Python 3.10 - SDK v2",
"language": "python",
"name": "python310-sdkv2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.7"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "ad53975d6cd14b07e2ef76ca5c680233933f2b5b5c4b5fa1fc47a72f4636b78d"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}